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Abstract 

Many  spectral  detection  algorithms  require  precise  ground  truth  measurements 
that  are  hand-selected  in  the  image  to  apply  radiometric  calibration,  converting  image 
pixels  into  estimated  reflectance  vectors.  That  process  is  impractical  for  mobile, 
real-time  hyperspectral  target  detection  systems,  which  cannot  empirically  derive  a 
pixcl-to-reflectance  relationship  from  objects  in  the  image.  Implementing  automatic 
target  recognition  on  high-speed  snapshot  hyperspectral  cameras  requires  the  ability 
to  spectrally  detect  targets  without  performing  radiometric  calibration. 

This  thesis  demonstrates  human  skin  detection  on  hyperspectral  data  collected 
at  a  high  frame  rate  without  using  calibration  panels,  even  as  the  illumination  in 
the  scene  changes.  Compared  to  an  established  skin  detection  method  that  requires 
calibration  panels,  the  illumination-invariant  methods  in  this  thesis  achieve  nearly  as 
good  detection  performance  in  sunny  scenes  and  superior  detection  performance  in 
cloudy  scenes. 

The  method  in  this  thesis  defines  a  pixel-to-reflectance  relationship  as  an  illumina¬ 
tion  transform  acting  on  a  reflectance  vector.  The  illumination  transform  is  estimated 
from  a  series  of  multi-environmental  atmospheric  radiative  transfer  function  simula¬ 
tions.  Applied  to  dismount  detection,  this  transform  produces  an  estimated  skin 
signature  that  is  used  by  three  different  hyperspectral  detection  algorithms.  These 
algorithms  consistently  achieve  false  alarm  rates  below  1%  while  detecting  80%  of 
skin  pixels  in  a  variety  of  illumination  conditions.  As  the  scene  illumination  changes 
from  sunny  to  cloudy  in  a  sequence  of  test  images,  these  algorithms  maintain  a  high 
average  skin  pixel  detection  rate  with  an  AUG  of  over  95%. 
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SPECTRAL  DETECTION  OF  HUMAN  SKIN  IN  VIS-SWIR  HYPERSPECTRAL 


IMAGERY  WITHOUT  RADIOMETRIC  CALIBRATION 

1.  Introduction 

Automatic  target  recognition  (ATR)  is  a  key  enabling  concept  in  the  defense 
community,  where  smart  sensors  are  deployed  to  detect  threats.  Such  threats  could 
include  vehicles,  obstacles,  or  humans.  Air  crews  and  ground  troops  use  infrared  and 
night-vision  cameras  to  detect  threats  in  low-light  conditions.  More  recently,  com¬ 
puters  are  using  these  cameras  to  identify  and  track  potential  threats  in  imagery  and 
video.  Using  computers  for  image-based  threat  detection  requires  an  understanding  of 
the  characteristics  that  make  these  threats  unique.  For  example,  detecting  dismounts 
can  be  aided  by  focusing  on  the  unique  characteristics  of  human  skin. 

Previous  research  has  yielded  a  method  of  detecting  human  skin  in  hyperspectral 
imagery  that  requires  only  four  spectral  bands  instead  of  hundreds  [37].  That  research 
led  to  the  design  of  a  prototype  multispectral  camera  with  optical  filters  tuned  to 
view  those  four  spectral  bands  [40].  This  study  advances  hyperspectral  dismount 
detection  towards  becoming  a  deployable  capability  by  evaluating  its  performance 
under  unknown  outdoor  illumination  conditions. 

1.1  Problem  Statement 

Snapshot  hyperspectral  image  (HSI)  cameras  have  recently  become  available  on 
the  market  [55].  They  capture  visual  information  about  a  scene  in  two  spatial  dimen¬ 
sions,  across  multiple  spectral  channels  at  a  fast  frame  rate.  Such  data  can  provide 
valuable  information  if  it  is  processed  effectively  and  efficiently.  In  particular,  the 
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snapshot  HSI  camera  has  advantages  as  a  new  ATR  sensor,  as  it  captures  high  res¬ 
olution  information  in  spatial,  spectral,  and  temporal  dimensions.  Threats  can  be 
identified  not  only  by  their  size,  shape,  and  spectral  signature,  but  also  by  their 
motion  across  a  scene. 

Scanning  HSI  cameras  take  several  seconds  to  capture  each  datacube,  an  image 
with  spatial  height  and  width  and  spectral  depth.  Material  identification  algorithms 
developed  for  these  datacubes  have  the  luxury  of  taking  several  seconds  to  process  the 
data  and  identify  objects  by  their  spectral  signature  before  the  next  datacube  arrives 
from  the  camera.  Snapshot  HSI  cameras  produce  datacubes  at  a  faster  rate,  decreas¬ 
ing  the  available  computation  time  to  perform  detection  between  frames.  Snapshot 
HSI  cameras  require  computationally  fast  detection  algorithms  to  fully  utilize  their 
fast  frame  rates. 

Dismount  detection  algorithms  search  a  datacube  for  the  spectral  signature  of 
human  skin.  The  spectral  reflectance  of  human  skin  has  been  thoroughly  studied 
[37],  and  hyperspectral  skin  detection  algorithms  have  been  proposed  [6,  38].  A 
pair  of  calibration  panels  is  required  in  each  scene  to  convert  pixels  into  estimated 
reflectance  vectors  before  comparing  them  to  the  known  spectral  reflectance  of  human 
skin.  Calibration  accounts  for  factors  like  atmospheric  moisture  and  the  angle  of  the 
sun  in  the  sky. 

This  calibration  step  makes  this  skin  detection  algorithm  inappropriate  for  use 
with  snapshot  HSI  cameras  for  three  reasons.  First,  the  reference  objects  required  for 
calibration  may  not  appear  in  every  scene,  especially  if  the  camera  is  moving.  The 
second  reason  is  that  the  pixels  of  the  reference  objects  must  be  manually  selected 
for  each  datacube,  adding  considerable  processing  time.  The  third  reason  is  the 
processing  time  involved  in  calibrating  each  pixel,  which  contributes  to  the  overall 
computational  time  allotment. 
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A  real-time  ATR  sensor  like  the  snapshot  HSI  camera  needs  a  skin  detection 
algorithm  that  does  not  rely  on  reference  objects  and  minimizes  the  processing  time 
between  capturing  a  datacube  and  the  detection  of  human  skin.  It  should  be  invariant 
to  different  atmospheric  and  illumination  conditions.  This  will  save  time  by  not 
requiring  calibration  of  every  pixel  in  the  scene. 

1.2  Justification 

Using  a  specialized  skin  detection  algorithm  with  a  snapshot  HSI  camera  offers 
several  benefits.  These  algorithms  are  accurate,  robust  in  different  conditions,  and 
usable  in  real-time. 

Skin  detection  algorithms  combined  with  HSI  cameras  are  more  accurate  than 
the  leading  skin  detection  methods  for  color  cameras  [37].  Pixels  in  a  color  camera 
represent  light  in  a  red-green-blue  (RGB)  triad.  Pinkish-brown  objects  are  commonly 
misclassified  as  skin  in  RGB  pixels.  In  contrast,  HSI  cameras  sample  the  electromag¬ 
netic  spectrum  across  many  channels,  picking  up  features  that  distinguish  skin  from 
other  objects  with  greater  accuracy  than  RGB  cameras. 

An  effective  skin  detection  method  must  be  robust  in  different  atmospheric  and 
illumination  conditions.  It  must  also  detect  skin  without  including  calibration  panels 
in  each  scene.  This  skin  detection  method  must  produce  accurate  results  in  minimal 
processing  time. 

Real-time  threat  detection  relies  on  fast,  efficient,  and  accurate  human  skin  de¬ 
tection.  An  ideal  skin  detection  technique  will  minimize  the  number  of  computations 
between  an  observed  image  and  a  classified  image.  This  necessitates  the  ability  to 
detect  skin  in  the  raw  image  data  instead  of  calibrated,  preprocessed  data. 
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1.3  Assumptions 


Using  a  skin  detection  camera  for  threat  detection  applications  assumes  that  hu¬ 
man  skin  is  an  indicator  of  a  human  target.  Conditions  may  arise  in  which  a  human 
target  has  no  skin  exposed.  For  example,  in  very  cold  weather  conditions  a  human 
target  may  be  covered  in  protective  winter  clothing.  Since  a  skin  detection  camera 
cannot  see  though  objects,  it  will  only  detect  exposed  skin. 

Although  threat  detection  could  be  applied  to  indoor  settings,  the  skin  detection 
camera  is  designed  for  natural  illumination  from  the  sun.  Sunlight  is  far  easier  to 
model  than  artificial  light  sources.  Numerous  software  models  can  generate  sunlight 
profiles  for  different  atmospheric  conditions.  Inconsistencies  in  artificial  light  sources, 
like  street  lamps  or  vehicle  headlights,  are  too  variable  across  all  possible  scenarios. 
The  sun  is  a  common  light  source  that  can  be  reliably  modeled. 

The  need  for  sunlight  illumination  limits  the  skin  detection  camera  to  daytime 
use.  While  it  may  seem  trivial  to  detect  humans  in  daytime  imagery  with  the  naked 
eye,  computers  do  not  have  as  finely  tuned  target  recognition  as  people;  however,  with 
advanced  sensors,  automated  dismount  detection  is  becoming  a  reality.  Skin  detection 
gives  the  computer  a  starting  point  from  which  to  search  for  a  human  threat.  Future 
applications  may  adapt  the  skin  detection  camera  for  use  at  night  in  order  to  enhance 
night  vision  devices. 

1.4  Standards 

Illumination-invariant  skin  detection  methods  will  be  evaluated  on  their  accuracy 
across  different  scenarios. 

A  common  metric  for  detection  accuracy  is  the  comparison  of  the  correct  classifi¬ 
cation  rate  to  the  incorrect  classification  rate.  Correctly  classifying  a  pixel  as  skin  is 
called  a  detection,  and  the  probability  of  detection  is  given  by  the  number  of  correctly 
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identified  skin  pixels  out  of  the  total  number  of  skin  pixels  in  the  image.  When  a 
background  pixel,  e.g.  one  that  is  not  skin,  is  incorrectly  classified  as  a  skin  pixel,  it 
is  called  a  false  alarm.  The  probability  of  false  alarm  is  the  number  of  background 
pixels  labeled  as  skin  out  of  the  total  number  of  background  pixels. 

1.5  Approach 

In  order  to  understand  the  appearance  of  human  skin  in  HSI  cameras,  the  infor¬ 
mation  represented  in  a  hyperspectral  pixel  must  be  understood.  First,  a  physical 
model  for  sensor-reaching  radiance  is  needed.  Such  a  model  describes  the  light  arriv¬ 
ing  at  the  camera  from  a  scene.  Next,  the  spectral  characteristics  of  human  skin  are 
examined.  Sensor-reaching  radiance  that  originates  from  human  skin  has  a  unique 
spectral  distribution.  It  is  affected  by  the  spectral  reflectance  of  skin  and  the  spectral 
distribution  of  the  light  source. 

After  characterizing  the  spectral  reflectance  of  skin,  the  scene  illumination  must 
be  understood.  The  scope  of  this  study  is  limited  to  outdoor,  natural,  daytime  il¬ 
lumination;  therefore,  the  light  being  reflected  off  human  skin  originates  from  the 
sun.  Sunlight  under  different  conditions  is  simulated  by  a  radiative  transfer  com¬ 
puter  program.  From  these  simulations,  a  common  illumination  estimate  is  derived 
that  produces  an  estimated  skin  spectral  radiance  signature  when  combined  with  the 
known  spectral  reflectance  of  skin. 

The  estimated  skin  spectral  radiance  signature  is  then  applied  to  three  hyper¬ 
spectral  detection  algorithms:  the  matched  filter,  the  linear  discriminant,  and  the 
adaptive  coherence  estimator.  Data  is  collected  with  a  HyperSpecTIR3  camera  from 
a  scene  that  includes  a  human  subject  with  exposed  skin  under  natural  sunlight.  The 
data  includes  a  sunny  scene  and  a  cloudy  scene.  All  three  skin  detection  algorithms 
are  applied  to  the  images  and  the  results  are  compared. 
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These  algorithms  maintain  false  alarm  rates  below  1%  while  detecting  80%  of  skin 
pixels  across  all  test  images.  As  the  scene  illumination  changes  from  sunny  to  cloudy, 
these  algorithms  maintain  a  high  average  skin  pixel  detection  rate  with  an  AUC  of 
over  95%. 

1.6  Materials  and  Equipment 

Collecting  a  human  skin  spectral  reflectance  measurement  will  require  a  spectrom¬ 
eter  with  a  reflectance  measurement  contact  probe.  An  Analytical  Spectral  Devices 
(ASD)  FieldSpec®  Pro  spectrometer  is  used  for  this  research.  A  radiative  transfer 
code  will  be  needed  to  generate  a  large  and  diverse  set  of  atmospheric  and  illumi¬ 
nation  simulations.  MODTRAN®5,  a  radiative  transfer  code  often  referenced  in  the 
literature,  is  used  for  this  research.  Data  processing  software  is  also  necessary  for 
this  study.  For  this,  MATLAB®  is  used  for  analyzing  data  simulations.  Finally, 
the  HyperSpecTIR®3  (HST3)  camera  is  used  for  collecting  data  to  evaluate  the  skin 
detection  techniques. 
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2.  Background 


This  chapter  reviews  works  of  literature  that  explain  important  concepts  in  de¬ 
tecting  dismounts.  A  brief  overview  of  dismount  detection  methods  in  Section  2.1 
provides  an  understanding  of  spectral  detection  of  human  skin.  A  physical  model  is 
presented  in  Section  2.2  that  describes  the  optical  path  of  light  entering  a  camera 
system  to  create  an  image.  This  is  followed  by  general  explanation  of  how  cam¬ 
eras  convert  light  into  digital  images  in  Section  2.3.  General  atmospheric  correction 
techniques  and  their  importance  are  detailed  in  Section  2.4. 

A  few  important  hyperspectral  detection  algorithms  are  introduced  in  Section  2.5, 
including  the  matched  filter,  the  linear  discriminant,  the  normalized  difference  index, 
and  the  adaptive  coherence  estimator.  This  chapter  concludes  with  a  discussion  on 
human  skin  detection  methods  in  RGB  images  and  HS1  datacubes  in  Section  2.6. 

2.1  Dismount  Detection 

A  dismount  is  a  human  who  is  on  the  ground  and  not  in  a  vehicle  [19].  Dismount 
detection  and  tracking  has  a  variety  of  military  and  civilian  applications.  Military 
uses  include  crowd  monitoring,  weapon  targeting,  search  and  rescue,  and  collateral 
damage  avoidance.  Civilian  uses  include  industrial  safety,  security  systems,  improved 
computer  interfaces,  and  biometric  identification.  A  variety  of  sensors  are  used  in 
dismount  detection:  RGB  cameras,  HS1  cameras,  thermal  infrared  sensors,  and  syn¬ 
thetic  aperture  radar  (SAR).  Dismount  detection  may  employ  temporal,  spatial,  or 
spectral  detection  techniques  [29]. 

Temporal  techniques  rely  on  the  unique  motion  of  human  bodies.  In  SAR,  moving 
dismounts  are  detected  by  their  distinctive  gait,  breathing,  and  limb  movement,  each 
of  which  reflects  a  measurable  Doppler  signature  [19,  57].  Video  cameras  can  also 
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detect  moving  dismounts  by  the  distinct  periodicity  of  walking  and  running  [36]. 

Spatial  techniques  for  dismount  detection  often  compare  the  observed  scene  to 
a  template  of  the  expected  shape  of  the  dismount.  This  can  be  accomplished  with 
a  histogram  of  oriented  gradients  (HOG),  or  searching  for  human-shaped  geometric 
figures  [13].  To  reduce  the  search  space  and  computational  time,  spatial  dismount 
detectors  are  often  cued  to  search  in  areas  that  consist  of  human-colored  pixels,  which 
is  called  spectral  detection. 

Spectral  dismount  detection  techniques  exploit  the  known  spectral  signature  of  dis¬ 
mounts.  Camera  based  dismount  detectors  differentiate  spectral  signatures  between 
pixels  and  mark  areas  of  interest  for  spatial  and  temporal  techniques.  Depending  on 
sensor-to-target  range,  the  dismount  may  not  occupy  enough  pixels  to  be  detected 
spatially;  however,  the  spectral  detection  technique  is  still  a  viable  detection  method. 

2.2  Sensor-Reaching  Radiance 

In  order  to  develop  a  spectral  detector,  it  is  important  to  understand  the  nature 
of  imaging  spectroscopy.  The  remote  sensing  community  is  concerned  with  images  of 
the  ground  taken  from  an  airborne  or  space-borne  HSI  sensor.  In  this  context,  light 
traveling  from  the  Sun  to  the  Earth  passes  through  the  Earth’s  atmosphere,  where  it  is 
transmitted,  scattered,  and  absorbed  at  different  rates  depending  on  the  wavelength. 
Light  that  reaches  through  the  atmosphere  strikes  the  surface  of  an  object,  where  it 
is  again  absorbed  and  reflected  at  different  rates  for  different  wavelengths  depending 
on  the  material  properties.  Reflected  light  from  the  surface  of  the  object  travels  back 
up  through  the  atmosphere,  encountering  additional  scattering  and  absorption  before 
it  is  collected  by  the  aperture  of  the  sensor.  This  light  energy  at  the  sensor  is  called 
apparent  radiance  or  sensor- reaching  radiance  (SRR). 

SRR  is  composed  of  light  rays  reflected  from  the  scene,  as  well  as  scattered  from 
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Figure  1.  Sensor- reaching  radiance  in  the  visible  to  near-infrared  spectrum  is  composed 
of  reflected  direct  sunlight  (La),  scattered  sunlight  (Lb),  multiple-bounce  reflected  light 
from  the  background  (Lq),  the  adjacency  effect  (Lj),  and  path  radiance  (Lq)- 

the  air  molecules  in  front  of  the  sensor.  The  latter  is  called  path  radiance,  and  is 
dependent  on  the  atmosphere  and  illumination  geometry.  Path  radiance  and  other 
factors  that  comprise  SRR  are  illustrated  in  Fig.  1.  The  governing  equation  for  SRR 
in  the  visible  to  infrared  spectrum  can  be  expressed  as  [17,  46]: 

L  =  La  +  Lb  +  Lq  +  Lq  +  Lj,  (1) 

where  L  represents  the  total  SRR,  La  is  the  direct  sunlight  reflected  off  the  target, 
Lb  is  sunlight  scattered  by  the  atmosphere  and  reflected  off  the  target,  Lq  is  the 
path  radiance,  Lq  is  a  multiple-bounce  of  light  from  the  background  to  the  target  to 
the  sensor,  and  Lj  is  scattered  light  reflected  off  the  adjacent  background.  Although 
each  of  these  terms  are  a  function  of  wavelength  A,  the  functional  notation  has  been 
abbreviated  in  Eq.  (1)  for  compactness. 

The  last  term  in  Eq.  (1),  Lj,  accounts  for  the  adjacency  effect,  where  light  is 
reflected  from  an  object  near  the  target  and  is  scattered  into  the  sensor-target  path  by 
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the  atmosphere,  making  it  appear  to  come  from  the  target.  It  is  sometimes  regarded 
as  negligible,  while  some  authors  do  include  it  in  their  models  [1,  33,  34],  For  the 
purpose  of  this  study,  the  adjacency  effect  will  be  ignored. 

Multiple-bounce  reflected  background  light,  LG,  is  difficult  to  model  [17].  Cases 
where  the  target  is  exposed  to  sunlight  La  or  skylight  Lb,  the  reflected  background 
term  Lq  is  typically  dominated  by  La  and  LB.  Thus,  for  ease  of  calculation,  the 
reflected  background  LG  will  also  be  ignored. 

Both  La  and  LB  incorporate  light  that  has  been  attenuated  by  the  atmosphere 
and  reflected  off  the  target  towards  the  sensor.  For  objects  in  direct  light,  La  has  a 
much  greater  contribution  than  LB  in  Eq.  (1).  However,  for  an  object  in  the  shade 
where  La  is  occluded,  La  goes  to  zero  and  LB  represents  all  of  the  reflected  light 
from  the  object.  Both  of  these  terms  will  be  described  here. 

2.2.1  Direct  Reflected  Sunlight 

Direct  reflected  light  (La)  at  the  sensor,  as  a  function  of  wavelength  (A),  can  be 
written  as  [46]: 

La( A)  =  E's(\)  cos  o‘/-^^-r1(A)r2(A),  (2) 

7 T 

where  E's( A)  is  the  irradiance  distribution  of  the  sunlight  measured  at  the  top  of 
the  Earth’s  atmosphere  (illustrated  in  Fig.  2),  a’  is  the  angle  from  the  object  to  the 
sun  relative  to  the  surface  normal  of  the  object,  r(A)  is  the  spectral  reflectance  of 
the  object,  Ti(A)  is  the  atmospheric  transmittance  along  the  sun-to-ground  path,  and 
72(A)  is  the  atmospheric  transmittance  along  the  ground-to-sensor  path.  Since  Eq.  (2) 
applies  to  objects  facing  the  sun,  the  angle  o'  is  limited  to  values  between  0  and  tt 
such  that  0  <  cos  a'  <  1. 

The  spectral  reflectance  (r(A))  is  the  ratio  of  reflected  to  incident  light  from  the 
surface  of  an  object  [46].  The  spectral  reflectance  is  a  unitless  measure  ranging 
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Figure  2.  Solar  irradiance  distribution  incident  on  the  Earth’s  surface  (E's(X)ri(X)  in 
Eq.  (2))  generated  in  MODTRAN®5,  assuming  a  solar  zenith  angle  of  10  degrees  from 
nadir  and  a  mid-latitude  summer  standard  atmosphere. 


from  zero  to  one  as  a  function  of  wavelength.  In  general,  the  radiance  of  reflected 
light  changes  with  viewing  angle  according  to  the  object’s  bidirectional  reflectance 
distribution  function  (BRDF)  [46].  For  objects  where  viewing  angle  changes  the 
spectral  distribution  of  r( A)  at  different  wavelengths,  the  BRDF  is  an  important 
factor  in  SRR  equations.  However,  accounting  for  variability  in  sun  angle,  sensor- 
to-target  angle,  and  surface  normal  adds  significant  complexity.  For  simplicity,  this 
study  will  assume  that  the  BRDFs  of  all  targets  are  uniform  with  respect  to  scene 
geometry.  That  is,  light  incident  on  a  surface  will  be  reflected  diffusely,  with  equal 
magnitude  in  all  directions.  This  is  represented  in  Eq.  (2)  by  the  factor  f ,  as  reflected 
light  is  distributed  equally  over  the  entire  reflectance  hemisphere  of  n  steradians. 

2.2.2  Reflected  Skylight 

Reflected  skylight  (Lb)  at  the  sensor,  as  a  function  of  wavelength  (A),  can  be 
written  as  [46]: 

Lb(X)  =  FEds(X)r-^r2(X),  (3) 

7 T 
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where  F  is  a  form  factor  ranging  from  0  to  1  representing  the  fraction  of  the  total 
sky  exposed  to  the  object,  E(js  is  the  average  downwelled  skylight  in  the  hemisphere 
above  the  object,  and  r( A)  and  T2(A)  are  the  same  as  in  Eq.  (2).  Combining  Eq.  (1), 
(2),  and  (3),  and  dropping  the  Lq  and  L/  terms,  results  in  a  simplified  model  for 
SRR  [33]: 


L{  A)  =  [^(AjcosaV^A)  +  FEds(X)}-r2(X)r(X)  +  Lc{\),  (4) 

7 r 

which  can  be  written  compactly  [31]  as: 

L(A)  =  LR(A)r(A)  +  Lc(A).  (5) 

The  physical  model  summarized  in  Eq.  (5)  describes  SRR  as  a  linear  combination 
of  reflected  sunlight  (Lr( A)  and  path  radiance  (Lc( A)).  Information  about  the  objects 
in  the  scene  is  carried  by  the  signal  LR(X)r(X),  while  the  path  radiance  Lq( A)  is 
independent  of  the  objects  in  the  scene. 

2.2.3  Electromagnetic  Spectrum 

Light  that  humans  can  see  with  the  naked  eye  comprises  a  segment  of  the  elec¬ 
tromagnetic  spectrum  known  as  the  visible  (VIS)  spectrum.  Light  with  wavelengths 
from  about  A  =  0.39/mi  to  A  =  0.75/un  fall  within  the  VIS  spectrum.  Shorter  wave¬ 
lengths,  including  ultraviolet  (UV)  light  and  X-rays,  are  invisible  to  humans.  Some 
cameras  respond  to  UV  light,  but  HSI  cameras  usually  capture  longer  wavelengths. 

Infrared  light  has  longer  wavelengths  than  visible  light,  and  is  separated  into  dif¬ 
ferent  sub-regions,  shown  in  Fig.  3.  The  near  infrared  (NIR)  and  short-wave  infrared 
(SWIR)  regions  are  separated  by  an  atmospheric  absorption  band  at  1.4/un  caused 
by  water  vapor.  HSI  cameras  designed  to  capture  reflected  light  typically  respond  to 
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Figure  3.  Atmospheric  transmittance  (ti(A))  of  the  electromagnetic  spectrum  from 
A  =  0.3/im  to  A  =  2.5/im.  Transitions  between  infrared  sub-divisions  are  marked  by 
absorption  bands  that  are  due  to  water  vapor,  ozone,  carbon  dioxide,  and  other  atmo¬ 
spheric  aerosols. 


light  in  the  VIS  to  SWIR  spectrum.  Thermal  imagers  capture  infrared  light  at  much 
longer  wavelengths:  3-8/irn  for  mid-wave  infrared,  and  8-15/im  for  long-wave  infrared. 

2.3  Camera  Response 

When  the  SRR  enters  the  aperture  of  the  camera,  it  propagates  through  a  set  of 
optics  and  filters  before  striking  a  focal  plane  to  form  an  image  [46].  Each  pixel  of 
an  image,  x  =  [x\,  ...,Xn]t ,  is  comprised  of  N  channels.  The  value  of  each  channel 
is  proportional  to  the  SRR  as  processed  by  a  different  spectral  filter.  The  equation 
that  describes  x  in  terms  of  SRR  from  Eq.  (5)  is  [46] : 

piexp  r 

Xi=  /  L(X)Ci{X)d\dt,  (6) 

Jo  J  A 

where  Xi  is  the  digital  number  value  of  the  ith  channel  of  pixel  x,  texp  is  the  exposure 
time,  L(X)  is  the  SRR  from  Eq.  (5),  Ci( A)  is  the  spectral  response  of  the  ith  channel 
filter,  and  A  is  wavelength.  Equation  (6)  ignores  camera  noise  that  may  affect  the 
pixel  values,  such  as  shot  noise  or  thermal  noise. 
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Table  1.  Channel  responses  of  common  imaging  sensors 


Camera 

Type 

Spectral  Range 

Channels 

Channel  FWHM 

Common  RGB  camera 

VIS 

0.39-0.75  pm 

3 

~0.1pm 

CAP’s  ARCHER 

VIS-NIR 

0. 5-1.1  pm 

52 

12nm 

NRL’s  HYDICE 

VIS-SWIR 

0.4-2. 5  pm 

210 

lOnm 

NASA’s  AVIRIS 

VIS-SWIR 

0.4-2. 5  pm 

224 

9.5nm 

SpecTIR’s  HST3 

VIS-SWIR 

0.45-2.45  pm 

227 

8.2nm  -  12nm 

Usually  the  exposure  time  (texp)  is  too  short  to  capture  temporal  changes  in  SRR. 
Since  many  cameras  are  able  to  automatically  set  the  exposure  time  to  prevent  satu¬ 
ration,  it  is  assumed  that  texp  is  set  such  that  the  sensor  operates  in  its  linear  region. 
Thus,  texp  can  be  factored  out  of  the  integral  and  treated  as  a  constant: 


Xi  =  texp  J  L(X)Ci{X)d\.  (7) 

Channel  spectral  response  Ci( A)  incorporates  the  spectral  response  of  the  optical 
filter,  the  throughput  efficiency  of  the  camera  optics,  and  the  sensitivity  of  the  him 
or  charge-coupled  device  (CCD).  Each  of  the  N  channels  has  a  different  spectral 
response  C'j(A);  therefore,  the  pixel  x  is  a  sampling  of  different  parts  of  the  spectrum  of 
L( A).  Hyperspectral  cameras  often  have  channel  responses  that  are  Gaussian  shaped. 
The  spectral  bandwidth  of  a  Gaussian  filter  is  called  its  Full  Width  Half  Maximum 
(FWHM).  For  channels  with  a  narrow  FWHM,  Cj(A)  approximates  a  weighted  delta 
function  <5(Aj),  in  which  case  Eq.  (7)  is  reduced  to  [31]: 


Xi  «  CjL(Aj), 


(8) 


with  the  constant  c*  replacing  the  constants  for  exposure  time  and  filter  width,  and 
where  A,;  is  the  center  wavelength  of  channel  filter  C'j(A).  Omitted  from  Eq.  (8)  is  a 
quantization  noise  term  that  is  a  consequence  of  storing  the  result  aq  as  a  hardware- 
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defined  data  type.  For  cameras  with  narrow  FWHM  filters,  Eq.  (8)  provides  a  conve¬ 
nient  linear  relationship  between  SRR  (L( A))  and  pixel  values  (x).  Table  1  lists  the 
channel  responses  for  different  types  of  cameras. 

2.4  Atmospheric  Correction 

The  spectral  reflectance  (r( A))  component  of  SRR  carries  information  about  ob¬ 
jects  in  the  scene,  as  described  in  Section  2.2;  therefore,  it  is  the  term  used  to  perform 
spectral  identification.  The  illumination  components  of  SRR,  T_r(A)  and  Lc(X)  in 
Eq.  (5),  must  be  factored  out  to  estimate  the  object’s  spectral  reflectance. 

2.4.1  Radiative  Transfer  Codes 

Radiative  transfer  (RT)  codes  such  as  MODTRAN  [4],  FLAASH  [34],  HATCH 
[41],  and  ATREM  [15],  take  in  a  set  of  atmospheric  parameters  and  generate  the 
terms  in  Eq.  (4).  These  are  then  used  to  factor  out  atmospheric  effects  in  an  image 
and  estimate  the  spectral  reflectances  of  objects. 

The  input  parameters  to  RT  codes  are  often  based  on  atmospheric  measurements 
at  the  time  the  image  was  taken.  Measurements  can  be  taken  with  meteorologi¬ 
cal  sensors  or  with  precision  laser  instruments  [11].  When  direct  measurements  are 
not  available,  the  input  parameters  can  be  estimates  of  the  past  measurements  of 
the  atmosphere.  Mis-estimating  the  atmospheric  parameters  in  RT  codes  can  cause 
diminished  performance  in  spectral  detection  algorithms  [56]. 

2.4.2  Empirical  Line  Method 

Lr{ A)  and  Lc{ A)  can  be  deduced  by  observing  the  appearance  of  objects  in  the 
scene  that  have  known  spectral  reflectances.  Instead  of  requiring  a  complete  atmo¬ 
spheric  measurement  at  the  time  the  image  is  taken,  this  method  requires  a  “ground- 
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truth”  spectral  reflectance  measurement  of  objects  in  the  scene. 

A  well-known  implementation  of  the  process  mentioned  above  is  the  Empirical 
Line  Method  (ELM)  [25,  48].  Adopting  the  narrowband  assumption  from  Section 
2.3,  ELM  is  a  combination  of  Eq.  (5)  and  Eq.  (8)  that  provides  a  linear  mapping 
from  pixel  values  (xj)  to  the  estimated  spectral  reflectance  (r'). 

Applying  the  SRR  model  in  Eq.  (5)  to  the  pixel  simplification  in  Eq.  (8)  gives, 

Xi  =  QLR(Aj)r'  +  aLc{\),  (9) 


which  is  solved  for  r(, 


1  —Lc{\) 

CiLR(\i)  '  Lr(Aj) 


(10) 


Equation  10  in  linear  form  is  r\  =  rriiXi  +  6$,  where  the  slope  m,  and  offset  6*  are 
functions  of  reflected  and  upwelled  radiance  through  the  ith  channel.  By  selecting 
two  pixels  (x„,x9)  that  correspond  to  white  and  gray  calibration  panels  with  known 
spectral  reflectances  (rw(X),rg(X))i  and  assuming  that  the  same  illumination  condi¬ 
tions  L r(A)  and  Lc(X)  apply  to  all  pixels  in  the  scene,  the  values  of  m  and  b  are 
determined  for  the  ith  channel  by  solving  the  linear  equation: 


1 

6 

%w,i 

l 

rnt 

i 

xg,i 

l 

bi 

(11) 


Once  m  and  b  are  found  for  all  N  channels  in  Eq.  (11),  each  pixel  x  in  the  image 
can  be  converted  to  estimated  spectral  reflectance  by  applying  the  ELM  transform: 


r  =  Mx  +  b,  (12) 

where  r;=  [r'(Ai), . . . ,  7’/(Aat)]t,  x  =  [x\, . . . ,  xn]t,  b=  [/q, . . . ,  &w]T,  and  M  is  a 
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diagonal  matrix  with  entries  equal  to  each  slope  mf 


m1  0  ■  ■ ■  0 


M  = 


0  777.2 


0 


(13) 


0  0  •  •  •  777 tv 


The  result  of  Eq.  (12)  is  only  an  estimate  of  the  spectral  reflectance  because  of 
the  inaccuracies  introduced  by  the  filter  bandwidth  C;(A)  and  imaging  constants  Cj. 
Also,  the  unknown  orientation  of  objects  in  the  scene  makes  a  full  BRDF  estimate 
very  difficult,  so  ELM  must  assume  that  all  objects  are  diffuse. 


2.5  Spectral  Detection 

The  steps  involved  in  spectral  detection  can  be  described  as  an  algorithm  chain  as 
shown  in  Fig.  4  [16].  Raw  image  data,  a  collection  of  pixels  (x)  of  a  scene,  undergoes  a 
series  of  preprocessing  stages  prior  to  the  application  of  the  target  detection  algorithm. 
These  steps  may  include  atmospheric  compensation,  dimensionality  reduction,  noise 
whitening,  background  characterization,  and  adaptive  filter  generation.  Each  step  is 
intended  to  improve  the  accuracy  and  robustness  of  the  detection  algorithm  to  be 
applied,  often  at  the  cost  of  added  complexity  and  computation  time.  In  real-time 
target  detection,  the  algorithm  chain  must  maximize  both  computational  speed  and 
detection  accuracy. 

Hyperspectral  detection  algorithm  chains  may  be  divided  into  two  groups:  those 
that  employ  radiance-based  detection,  and  those  that  employ  reflectance-based  de¬ 
tection.  Reflectance-based  detection  includes  atmospheric  compensation  as  one  of  its 
preprocessing  steps,  producing  an  estimated  reflectance  vector  (r')  out  of  each  pixel 
(x)  as  described  in  Section  2.4.  A  target  detection  algorithm  searches  these  estimated 
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Figure  4.  Common  steps  in  the  algorithm  chain  for  spectral  detection  as  presented  in 
[16]. 
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reflectance  vectors  for  the  known  spectral  reflectance  of  the  target. 

Radiance-based  detection  skips  the  atmospheric  compensation  step  and  performs 
other  preprocessing  steps,  such  as  noise  whitening  and  normalization,  prior  to  apply¬ 
ing  the  detection  algorithm.  Ientilucci  et  al.  have  claimed  that  reflectance-based  and 
radiance-based  methods  perform  comparably  well,  as  long  as  atmospheric  effects  are 
considered  at  some  point  in  the  algorithm  chain  [21]. 

Many  different  detection  algorithms  exist  for  hyperspectral  data  [31].  In  gen¬ 
eral,  they  compare  an  observation,  x  or  r',  to  a  target  spectral  signature,  expected 
distribution  of  signatures,  or  a  set  of  detection  rules,  in  order  to  classify  the  obser¬ 
vation.  Some  detection  algorithms  do  not  require  prior  knowledge  of  any  spectral 
signatures  in  the  scene.  These  anomaly  detectors  highlight  regions  of  interest  in  the 
image  where  the  spectrum  is  statistically  distinct  from  the  rest  of  the  background 
[50].  Other  detectors  use  a  priori  knowledge  of  specific  signatures  either  taken  from 
a  spectral  library  [8]  or  measured  directly  [53] . 

For  target  recognition  purposes,  hyperspectral  detection  algorithms  of  interest  are 
those  that  search  for  a  specific  target  signature.  These  detection  methods  compare 
each  pixel  in  an  image  to  a  known  target  signature,  and  label  the  pixels  with  a 
score  based  on  their  similarity  to  the  target  signature.  Comparing  the  score  to  a 
threshold  (7)  results  in  pixel  classification.  The  performance  of  a  binary  classifier, 
which  classifies  each  pixel  as  either  target  or  non-target,  is  evaluated  by  the  probability 
of  detection  (Pd)  and  the  probability  of  false  alarm  ( Pfa )•  Setting  different  thresholds 
(7)  on  the  score  gives  different  (Pd,  Pfa)  pairs,  and  plotting  those  pairs  makes  a 
receiver  operating  characteristics  (ROC)  curve. 

The  following  sections  describe  basic  detection  methods  that  will  be  used  in  this 
study. 
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2.5.1  Linear  Discriminant 


Fisher’s  linear  discriminant  is  a  classification  method  for  multidimensional  data 
[12].  It  is  an  optimal  classifier  for  a  two-class  problem  set  for  which  the  first  and 
second  order  statistical  moments  are  known  [10].  The  linear  discriminant  analysis 
(LDA)  method  projects  multidimensional  data  down  to  a  single  dimension,  along 
which  the  two  classes  are  most  separated. 

The  LDA  equation  for  a  pixel  vector  (x)  is: 

7  4  w7x,  (14) 

H0 

which  classifies  a  pixel  as  either  target  (positive  hypothesis  Hi)  or  background  (null 
hypothesis  H0)  by  applying  a  threshold  7  to  the  result  of  a  dot  product  w7x.  The 
vector  w  is  normal  to  the  hyperplane  separating  the  target  and  background  pixel 
distributions  in  multidimensional  space.  The  optimal  w  is  given  by  [3]: 

w  =  (T,b  +  'Et)~1(fit-  fib),  (15) 

where  E&  is  the  background  covariance  matrix,  Et  is  the  target  covariance  matrix,  fib 
is  the  mean  background  pixel,  and  fit  is  the  mean  target  pixel. 

Target  statistics  Ef  and  fit  can  be  found  by  hand-selecting  target  pixels  from 
a  database  of  images.  Background  statistics  E5  and  fib  are  harder  to  define  unless 
specific  scenes  and  conditions  are  chosen.  With  empirical  data,  the  background  statis¬ 
tics  can  be  computed  from  all  of  the  remaining  pixels  after  target  pixels  have  been 
removed. 
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2.5.2  Matched  Filter 


The  matched  filter  (MF)  algorithm  is  a  well-known  hyperspectral  detection  method. 
Because  of  its  simplicity,  it  is  often  used  as  a  baseline  for  comparison  when  authors 
propose  more  complex  detection  methods.  The  MF  equation  is: 


(16) 


where  4ff  (x)  is  the  score  given  to  a  pixel  (x),  s  is  the  target  signature,  and  the  ||-|| 
operation  is  the  /2  norm  of  a  vector. 

The  score  is  maximum  at  d,M f (x)  =  1  for  a  perfect  match,  and  minimum  at 
c/MF(x)  =  0  for  orthogonal  vectors.  The  MF  method  is  invariant  to  changes  in 
magnitude  because  it  normalizes  each  pixel,  by  dividing  by  the  /2  norm,  before  taking 
the  dot  product  with  the  normalized  target  signature. 

A  common  shorthand  notation  for  a  normalized  vector  is  x  =  p|.  Using  this 
notation,  the  MF  formula  is: 

dMF{x)  =  srx,  (17) 


where  x  is  a  normalized  pixel  and  s  is  the  normalized  target  signature. 

The  MF  method  can  be  used  for  both  radiance-based  and  reflectance-based  detec¬ 
tion.  In  reflectance-based  detection,  the  target  signature  is  a  reflectance  vector  and 
the  pixels  are  the  estimated  reflectance  vectors  produced  by  atmospheric  correction. 
In  radiance-based  detection,  the  target  signature  is  the  expected  value  of  a  target 
pixel  taken  from  the  unprocessed  HSI  datacube. 


2.5.3  Adaptive  Coherence  Estimator 

The  Adaptive  Coherence  Estimator  (ACE)  detector  relies  on  a  geometric  inter¬ 
pretation  of  hyperspectral  data.  It  finds  the  squared  cosine  of  the  angle  between  a 
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Table  2.  Common  Normalized  Difference  Indices  and  their  associated  wavelengths 


Material 

Abbreviation 

Ai  (//m) 

A2  (Atm) 

Vegetation 

NDVI 

0.6-0. 7 

0.75-1.35 

Vegetative  Water 

NDWI 

0.86 

1.24 

Snow 

NDSI 

0.555 

1.64 

Human  Skin 

NDSI 

1.08 

1.58 

pixel  and  a  target  signature  [45] .  Like  the  LDA  detector,  the  ACE  detector  adapts  to 
each  image  by  accounting  for  the  covariance  of  the  image  data.  The  ACE  equation 
is: 

(sTE^1s)(xTE^1x)’ 

where  gUc.e(x)  is  the  score  given  to  a  pixel  x,  s  is  the  normalized  target  signature,  x  is 
a  normalized  pixel,  and  S5  is  the  background  covariance  matrix,  which  can  estimated 
by  taking  the  covariance  of  all  normalized  pixels  in  an  image  about  their  mean  [28]. 


2.5.4  Normalized  Difference  Index 

Another  well  known  detection  approach  is  the  use  of  the  Normalized  Difference 
Index  (NDI).  A  reflectance-based  detection  method,  the  NDI  identifies  pixels  of  a 
hyperspectral  image  that  have  specific  spectral  properties  associated  with  a  known 
material.  It  has  been  utilized  in  remote  sensing  applications  to  identify  vegetation 
[43],  vegetative  water  [14],  urban  build-up  [58],  snow  [44],  vegetative  moisture  [24], 
and  barrenness  [59].  NDI  applications  to  human  skin  detection  have  been  proposed 
[37,  38],  and  will  be  discussed  in  Section  2.6. 

The  NDI  compares  the  values  of  an  estimated  reflectance  vector  at  two  wave¬ 
lengths.  The  general  form  of  the  NDI  equation  is: 


dNDl(r') 


r'( Ai)  -  r/(A2) 
r'(Ai)  +  r'(A2)’ 


(19) 
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where  d^Diij')  is  the  score  between  —1  and  1  given  to  an  estimated  reflectance 
vector  r',  and  the  two  wavelengths  Ai  and  A2  reveal  features  that  distinguish  a  certain 
material  from  the  rest  of  the  image.  Several  NDI  methods  and  their  wavelengths  are 
listed  in  Table  2. 

2.6  Detecting  Human  Skin 

Detecting  the  spectral  signature  of  human  skin  is  necessary  for  ATR  systems 
designed  to  track  dismounts.  Skin  detection  precedes  spatial  feature  detectors  that 
search  for  the  shape  of  a  human  [9].  Further  analysis  can  then  indicate  other  relevant 
features  of  the  dismount  including  pose,  skin  type,  or  clothing  type.  Ultimately  these 
features  provide  information  to  determine  if  a  dismount  should  be  cued  for  further 
investigation.  The  development  of  some  early  skin  detection  methods  with  RGB 
cameras  are  explained  in  this  section,  followed  by  some  recent  efforts  to  detect  skin 
with  multispectral  and  hyperspectral  cameras. 

2.6.1  RGB  Skin  Detection 

Human  skin  color  detection  in  RGB  images  has  been  studied  extensively  [54],  It 
has  applications  that  include  facial  recognition  [20],  pose  recognition  [22],  content 
filtering  [13],  and  security  systems  [60]. 

RGB  pixels  have  three  channels,  described  in  Table  1.  The  three  channels  capture 
red,  green,  and  blue  light,  similar  to  the  spectral  response  of  the  human  eye.  Skin 
detection  methods  rely  on  the  clustering  of  skin  pixels  apart  from  background  pixels 
in  the  3-dimensional  RGB  vector  space,  shown  in  Fig.  5.  One  basic  way  to  detect 
skin  is  to  observe  that  skin  pixels  are  more  red  than  green  [5] ,  and  score  pixels  based 
on  their  red  to  green  ratio.  However,  this  method  has  a  high  false-alarm  rate  because 
many  other  objects  that  are  not  skin  produce  a  high  red  to  green  ratio. 
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Figure  5.  Three  dimensional  scatter  plot  of  skin  (red)  and  background  (blue)  pixels  in 
the  RGB  colorspace.  The  skin  pixels  have  a  higher  red  component  than  the  background 
pixels. 


A  more  accurate  way  to  detect  skin  in  RGB  images  is  to  estimate  the  probability 
distribution  functions  (pdf)  of  both  skin  and  background  pixels  and  perform  the 
likelihood  ratio  test  (LRT)  of  the  form, 


p(x|skin) 

p(x|backgronnd)  h0  ’ 


(20) 


where  p(x|@)  is  the  conditional  pdf  of  a  pixel  vector  x  given  the  condition  0  that  the 
pixel  is  either  skin  or  background.  The  pdfs  can  be  calculated  using  a  large  train¬ 
ing  set,  which  leads  to  an  optimal  LRT,  or  estimated  as  multidimensional  Gaussian 
distributions  [20]. 

Other  skin  detection  methods  transform  the  RGB  pixels  into  colorspaces  that 
have  less  correlation  between  components  or  are  less  varied  by  illumination.  Some  of 
these  colorspaces  are  normalized  RGB  [49],  Hue  Saturation  Value  (HSV)  [47],  and  the 
luma-chroma  colorspace  YCrCb  [20].  These  colorspaces  allow  for  detection  rules  in 
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fewer  dimensions.  However,  these  colorspaces  are  rearrangements  of  the  same  RGB 
data,  and  do  not  distinguish  skin  pixels  from  background  pixels  any  better  than  the 
RGB  colorspace.  In  [2]  it  is  shown  that  the  optimal  LRT  detector  performs  as  well  or 
better  in  the  original  RGB  colorspace  as  compared  to  any  other  derived  colorspace. 

The  LRT  yields  better  skin  detection  performance  than  the  red  to  green  ratio, 
but  it  still  has  a  high  false-alarm  rate.  Some  background  objects,  like  cardboard, 
flesh-colored  mannequins,  sand,  tan  shirts,  and  khaki  pants  have  a  similar  color  to 
skin  in  the  RGB  colorspace.  These  objects  are  often  misclassihed  as  skin  even  with 
the  optimal  LRT  detector.  This  suggests  that  there  is  inherently  an  upper  limit  to 
the  performance  of  RGB  skin  detection  methods.  The  limit  comes  from  the  fact  that 
RGB  cameras  collect  light  in  only  three  channels,  and  only  within  the  VIS  range  of 
the  spectrum. 

2.6.2  Hyperspectral  Skin  Detection 

To  overcome  the  limits  of  skin  detection  in  RGB  cameras,  recent  research  has 
investigated  the  appearance  of  human  skin  and  background  objects  in  cameras  that 
collect  electromagnetic  energy  outside  the  range  of  human  vision  [27,  37].  While  some 
flesh-colored  background  objects  have  the  same  red,  green,  and  blue  components  as 
human  skin,  they  reflect  light  much  differently  than  skin  in  the  NIR  and  SWIR  ranges 
of  the  spectrum.  Multispectral  and  hyperspectral  cameras  observe  electromagnetic 
energy  across  many  spectral  channels,  creating  multidimensional  clusters  of  skin  and 
background  pixels  that  are  more  distinct  than  in  RGB  imagery. 

Figure  6  illustrates  the  spectral  reflectance  of  human  skin  with  different  melanin 
levels.  One  paper  proposes  a  Normalized  Difference  Skin  Index  (NDSI),  a  variant 
of  the  NDI  described  in  Section  2.5.4,  to  detect  skin  in  hyperspectral  images  [37]. 
The  NDSI  compares  estimated  reflectance  vectors  at  wavelengths  Ai  =  1.08pm  and 
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Figure  6.  Spectral  reflectance  of  human  skin  at  VIS-SWIR  wavelengths.  Skin  with  less 
melanin  appears  brighter  because  it  has  higher  reflectance. 

A2  =  1.58/im.  Pixels  containing  human  skin  have  high  NDSI  values.  However,  the 
leaves  of  many  plants  and  trees  also  have  high  NDSI  values,  and  can  cause  false  alarms. 
To  mitigate  this,  the  paper  suggests  false- alarm  suppression  using  the  Normalized 
Difference  Green  Red  Index  (NDGRI)  with  Ai  =  0.54/im  and  A2  =  0.66/im.  Green 
vegetation  scores  high  on  the  NDGRI,  while  skin  does  not  because  skin  is  more  red 
than  green  as  discussed  in  Section  2.6.1.  Plotting  the  NDSI  against  the  NDGRI,  as 
shown  in  Fig.  7,  reveals  the  separation  between  skin  and  background  clusters.  Using 
these  metrics  for  skin  detection  yields  a  much  lower  false  alarm  rate  than  the  optimal 
LRT  detection  method  in  RGB  images. 

Hyperspectral  dismount  detectors  have  been  designed  that  employ  the  NDSI  with 
NDGRI  false  alarm  suppression  [6,  9].  They  search  for  a  human  shape  by  starting  at 
pixels  marked  as  human  skin,  and  are  accelerated  due  to  the  false-alarm  suppression 
discussed  previously.  I11  another  paper,  the  spectral  reflectance  of  human  skin  in  the 
VIS  to  SWIR  spectrum  has  been  modeled  as  a  function  of  several  biometrics  such  as 
melanin  level  and  blood  oxygen  level  [37].  The  results  have  been  used  to  estimate  the 
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Figure  7.  Skin  and  background  pixels  are  separated  in  NDSI  vs  NDGRI  space.  Skin 
pixels,  shown  in  red,  receive  a  high  NDSI  score  and  a  low  NDGRI  score. 

melanin  level  and  skin  type  of  detected  dismounts  in  hyperspectral  images  [39]. 

2.7  Summary 

Designing  a  dismount  detection  camera  that  identifies  the  spectral  signature  of 
human  skin  requires  an  understanding  of  the  underlying  physical  model.  SRR  con¬ 
tains  information  about  objects  in  the  scene,  but  that  information  must  be  extracted 
from  the  atmospheric  and  illumination  components.  HSI  cameras  convert  SRR  into 
pixel  vectors  (x),  which  can  be  converted  into  estimated  reflectance  vectors  (r')  by 
applying  atmospheric  correction.  Spectral  detection  algorithms  may  include  atmo¬ 
spheric  correction  as  one  of  the  preprocessing  steps  in  an  algorithm  chain.  Shortening 
the  algorithm  chain  leads  to  a  faster  detection  algorithm,  but  may  degrade  perfor¬ 
mance.  The  Matched  Filter,  Linear  Discriminant,  and  Adaptive  Coherence  Estimator 
are  three  detection  algorithms  that  have  short  algorithm  chains. 
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Human  skin  detection  in  RGB  images  is  difficult  because  many  objects  have  the 
same  color  as  human  skin  in  the  VIS  region  of  the  electromagnetic  spectrum.  Hy- 
perspectral  skin  detection  performs  better  than  RGB  because  it  incorporates  spectral 
features  in  the  NIR  and  SWIR  bands.  However,  recent  hyperspectral  skin  detection 
methods  can  be  improved  by  removing  time-consuming  calibration  steps  from  the 
algorithm  chain. 
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3.  Methodology 


The  sensor-reaching  radiance  (SRR)  from  an  object  can  vary  depending  on  the 
type  of  illumination,  the  composition  of  the  atmosphere,  and  the  angle  at  which  it 
is  observed.  Accounting  for  these  factors  can  dramatically  shorten  computation  time 
when  performing  human  skin  detection  on  hyperspectral  image  (HSI)  datacubes.  This 
thesis  suggests  a  process  to  derive  an  estimated  target  signature  that  provides  accurate 
target  detection  across  multiple  illumination  scenarios.  The  process  will  be  applied 
primarily  to  skin  detection  in  this  thesis. 

First,  differences  between  dismount  detection  and  remote  sensing  are  discussed  in 
Section  3.1.  Next,  the  physical  model  for  SRR  is  adapted  from  the  general  remote 
sensing  case  to  the  specific  dismount  detection  case  in  Section  3.2.  Then  in  Section  3.3, 
the  Matched  Filter  (MF),  Linear  Discriminant  Analysis  (LDA),  and  Adaptive  Coher¬ 
ence  Estimation  (ACE)  detection  methods  that  were  introduced  in  Section  2.5  are 
each  combined  with  the  SRR  model  to  create  illumination-invariant  dismount  de¬ 
tectors.  These  require  an  estimate  of  the  scene  illumination,  which  is  simulated  in 
MODTRAN.  The  MODTRAN  simulation  setup  is  described  in  Section  3.4,  and  the 
procedure  for  processing  the  results  is  detailed  in  Section  3.5. 

3.1  Hyperspectral  Dismount  Detection 

Hyperspectral  dismount  detection  is  a  specific  application  of  the  more  general 
hyperspectral  material  identification  problem  that  has  been  thoroughly  studied  by 
the  remote  sensing  community.  Many  fundamental  assumptions  that  form  the  basis 
of  hyperspectral  detection  algorithms  in  remote  sensing  scenarios  remain  pertinent  to 
dismount  detection.  These  assumptions  help  to  shape  a  model  framework  that  can 
be  extended  to  the  dismount  detection  problem. 
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The  physics  model  underlying  remote  sensing  algorithms  is  introduced  in  Sec¬ 
tion  2.2.  Each  HSI  pixel  represents  SRR  that  is  convolved  through  a  channel  filter 
bank,  scaled  by  a  gain  factor,  and  quantized  to  form  a  vector  of  non-negative  digital 
numbers.  The  SRR  itself  is  a  function  of  the  solar  irradiance  distribution,  atmo¬ 
spheric  transmission,  scene  geometry,  object  spectral  reflectance,  and  scene-to-sensor 
path  radiance.  The  channel  filter  bank  collects  the  SRR  over  hundreds  of  narrow  ad¬ 
jacent  spectral  bands  to  provide  high-dimensional  HSI  data,  which  often  has  a  much 
lower  inherent  dimensionality  clue  to  correlation  between  channels.  Remote  sensing 
algorithms  search  for  HSI  pixels  that  most  likely  represent  materials  with  known  spec¬ 
tral  reflectances  under  the  given  illumination  conditions.  All  of  these  remote  sensing 
concepts  can  be  extended  to  hyperspectral  dismount  detection. 

Some  aspects  of  dismount  detection  are  different  from  the  general  remote  sensing 
model.  One  difference  is  that  dismount  detection  is  limited  by  a  maximum  effec¬ 
tive  sensor-to-target  range.  The  HSI  camera  must  be  close  enough  for  a  dismount 
to  occupy  multiple  pixels  in  order  to  apply  any  sort  of  spatial  recognition  method. 
Imposing  the  more  restrictive  condition  that  at  least  some  dismount  skin  pixels  must 
be  spectrally  pure,  that  an  area  of  exposed  skin  must  take  up  at  least  a  full  pixel, 
limits  the  range  even  further.  An  estimate  of  a  10cm2  pixel  covering  the  exposed  face 
of  a  dismount  puts  the  maximum  effective  range  for  full-pixel  skin  detection  with  the 
HST3  camera  at  100  meters,  given  its  instantaneous  field  of  view  (IFOV)  of  1  millirad 
[23].  This  is  a  much  closer  range  than  the  high-altitude  scenarios  for  which  the  re¬ 
mote  sensing  algorithms  were  developed.  At  this  close  range,  there  is  relatively  little 
atmosphere  between  the  sensor  and  the  target,  so  the  path  radiance  ( Lc )  becomes 
insignificant.  The  governing  equation  for  SRR  in  Eq.  (4)  reduces  to: 

L{  A)  =  [^(A)cos^n(A)  +  F£ds(A)]-r(A),  (21) 
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where  E's( A)  is  the  irradiance  distribution  from  the  sun  measured  at  the  top  of  the 
Earth’s  atmosphere,  o'  is  the  angle  from  the  object  to  the  sun  relative  to  the  surface 
normal  of  the  object,  ti(A)  is  the  atmospheric  transmittance  along  the  sun-to-ground 
path,  F  is  the  fraction  of  the  total  sky  exposed  to  the  object,  Eds( A)  is  the  average 
downwellcd  skylight  in  the  hemisphere  above  the  object,  and  r(A)  is  the  spectral 
reflectance  of  the  object.  Equation  (21)  reduces  to  L( A)  =  LR(X)r(X),  the  product  of 
the  scene  illumination  and  the  spectral  reflectance  of  an  object.  This  simplification 
assumes  Lambertian  surfaces. 

Another  distinction  between  remote  sensing  and  dismount  detection  is  the  viewing 
angle  of  the  sensor.  Remote  sensing  algorithms  assume  a  downward-looking  sensor 
observing  the  surface  of  the  Earth  from  a  high-altitude  platform.  A  dismount  detect¬ 
ing  sensor,  in  contrast,  will  likely  be  mounted  on  a  ground  vehicle,  a  structure,  or 
a  low- altitude  aircraft.  The  maximum  effective  altitude  will  be  limited  both  by  the 
range  to  target  and  by  the  fact  that  the  camera  must  maintain  a  low  viewing  angle. 
Too  high  of  a  viewing  angle,  i.e.  looking  straight  down,  will  not  only  make  it  hard 
to  discern  the  humanoid  shape  of  standing  dismounts,  but  will  shrink  the  apparent 
surface  area  of  the  dismount  and  potentially  block  the  visibility  of  any  exposed  skin. 
Restricting  the  sensor  to  low  altitudes  removes  any  atmospheric  boundary  layers  from 
the  sensor-target  path,  like  clouds,  which  can  cause  large  variations  in  SRR. 

One  consequence  of  a  low-altitude  sensor  with  a  low  viewing  angle  is  a  large 
variety  of  illumination  angles.  In  a  remote  sensing  scenario,  the  sensor  looks  down 
perpendicular  to  the  surface  of  the  Earth  and  the  sunlight  strikes  from  a  known  angle. 
Variation  in  exposure  angle  (o')  in  Eq.  (21)  is  due  to  surface  texture  and  shadows 
that  occur  near  tall  obstructions  like  trees  or  buildings  [1] .  For  a  dismount  detection 
sensor  at  a  low  viewing  angle,  there  are  more  possibilities  for  scene  geometry.  In  one 
scenario,  the  sun  is  behind  the  sensor,  directly  illuminating  the  front  of  a  standing 
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dismount.  In  another,  the  sun  illuminates  a  dismount  from  either  side.  The  sun  could 
also  be  behind  the  dismount,  and  the  only  surface  visible  to  the  sensor  is  shadowed 
and  illuminated  by  skylight.  In  each  of  these  cases,  the  sun  angle  from  zenith  could 
range  from  very  low  at  nadir  to  very  high  at  early  morning  or  late  evening.  All  cases 
must  be  assumed  to  be  equally  probable,  along  with  every  other  possible  combination 
of  sun  azimuth  and  zenith  angles.  Since  the  sun  could  illuminate  the  scene  from  any 
position  in  the  hemisphere  overhead,  there  is  a  wide  range  of  possible  illumination 
angles  on  a  potential  dismount  target. 

Even  if  the  position  of  the  sun  relative  to  the  sensor  and  dismount  are  known, 
there  are  still  a  wide  range  of  possible  exposure  angles  for  all  visible  surfaces  of  human 
skin  in  the  scene.  This  is  because  dismounts  have  surface  normals  pointing  in  many 
different  directions.  For  a  direct  illumination  source  like  the  sun,  few  surfaces  of  a 
dismount  are  directly  illuminated,  that  is,  having  surface  normals  pointing  parallel  to 
the  illumination  path.  Many  surfaces  are  angled  away  from  the  illumination  source, 
which  increases  the  exposure  angle  a'.  Accounting  for  all  of  these  possible  dismount 
surface  normals  and  illumination  positions  requires  a  3D  model  of  dismounts  in  dif¬ 
ferent  positions.  This  type  of  data  has  recently  been  studied  but  is  outside  the  scope 
of  this  research  [30] .  Overcast  skies  may  offer  better  scenarios  for  dismount  detection, 
as  cloudy  illumination  is  more  evenly  distributed  from  all  directions.  However,  clouds 
attenuate  much  of  the  SWIR  spectrum. 

To  understand  the  impact  of  exposure  angle  on  radiance  from  an  object,  consider 
the  example  in  Fig.  8.  In  Fig.  8,  there  are  two  objects  in  a  scene,  a  flat  rectangle 
and  a  sphere,  that  are  directly  illuminated  by  sunlight.  Both  objects  have  the  same 
Lambertian  spectral  reflectance  across  all  wavelengths,  and  differ  only  in  shape.  The 
flat  rectangle  will  appear  to  have  a  constant  brightness  across  its  entire  surface,  while 
the  sphere  will  be  brighter  in  some  areas  and  dimmer  in  others.  Although  they  both 
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Figure  8.  Example  illustrating  uneven  illumination  across  a  scene  caused  by  different 
surface  orientations. 

have  Lambertian  surfaces,  they  will  reflect  light  at  different  intensities.  This  is  not 
because  of  specularity,  but  exposure  angle  o' .  Along  the  outside  edges  of  the  sphere, 
the  exposure  angle  is  close  to  90  degrees,  bringing  the  cos  cr'  term  in  Eq.  (21)  close 
to  zero  and  making  those  parts  of  the  sphere  appear  dark.  Towards  the  center  of  the 
sphere,  the  exposure  angle  is  close  to  zero,  like  the  rectangle,  so  the  two  objects  have 
the  same  brightness. 

The  preceding  example  demonstrates  another  assumption  of  remote  sensing  algo¬ 
rithms  that  cannot  be  applied  to  dismount  detection.  Many  remote  sensing  algorithms 
search  for  a  known  reflectance  vector  by  backing  out  the  atmosphere  and  illumination 
from  the  image  pixels.  As  discussed  in  Section  2.4,  a  common  method  of  converting 
pixels  to  estimated  reflectance  vectors  is  to  employ  ELM  correction.  However,  ELM 
assumes  a  linear  relationship  between  the  SRR  from  an  object  and  that  object’s  in¬ 
herent  spectral  reflectance.  While  that  assumption  may  hold  true  for  remote  sensing 
of  the  Earth’s  flat  surface,  the  above  example  illustrates  that  it  does  not  apply  to 
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short-range,  low-angle  dismount  detection.  A  dismount  may  be  illuminated  unevenly 
and  have  different  brightness  levels  at  different  points,  but  that  does  not  mean  each 
point  has  a  different  inherent  spectral  reflectance,  as  an  ELM-corrected  HSI  datacube 
would  suggest.  Thus,  a  dismount  detection  method  should  not  search  for  a  known 
reflectance  vector  like  many  remote  sensing  methods.  It  should  instead  search  for  an 
estimated  target  signature  that  is  invariant  to  the  unknown  exposure  angle. 

Uneven  illumination  also  justifies  the  requirement  of  a  full-pixel  on  skin.  A  pixel 
mixture  model  used  in  remote  sensing  will  not  work  for  dismount  detection  because  it 
could  be  looking  at  the  side  of  a  face  that  is  blocked  from  the  sun  and  catch  some  of 
the  background  behind  it  that  is  fully  exposed  to  the  sun.  A  mixture  model  assumes 
that  the  spectral  reflectances  of  both  objects  within  the  pixel  are  illuminated  equally, 
which  may  not  be  the  case. 

Other  desirable  features  in  dismount  detection  are  to  decrease  computational  com¬ 
plexity  and  to  increase  robustness  in  different  environments.  Computational  complex¬ 
ity  is  rarely  a  concern  with  remote  sensing  algorithms  because  they  are  often  applied 
to  individual  images  during  post-flight  analysis  using  plentiful  computing  resources 
with  no  time  constraints.  In  contrast,  a  dismount  detection  algorithm  should  oper¬ 
ate  quickly  on  real-time  snapshot  hyperspectral  images  using  embedded  computing 
resources.  The  usefulness  of  a  dismount  detection  sensor  for  target  recognition  is 
directly  tied  to  its  speed  and  agility.  It  also  depends  on  the  variety  of  environments 
in  which  it  can  be  used.  A  dismount  detection  sensor  should  remain  functional  in 
situations  where  there  are  no  ground  truth  objects  to  measure  illumination. 

3.2  Radiance-Based  Illumination-Invariant  Detection 

Understanding  the  aspects  that  make  dismount  detection  unique  from  remote 
sensing  helps  derive  relevant  requirements  for  a  detection  method.  The  new  dismount 
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detection  method  must  be  radiance-based  and  illumination- invariant.  The  first  re¬ 


quirement,  radiance-based,  is  an  alternative  to  reflectance-based  detection  which  is 
popular  in  remote  sensing.  It  bypasses  much  of  the  algorithm  chain  in  Fig.  4  by 
applying  the  target  detection  algorithm  directly  to  the  unprocessed  image.  The  dif¬ 
ference  is  how  the  target  detection  rules  are  determined.  In  reflectance-based  target 
detection,  ELM  conversion  produces  estimated  reflectance  vectors  that  are  searched 
to  find  the  known  spectral  reflectance  of  the  target.  The  accuracy  of  the  ELM  con¬ 
version  depends  on  the  availability  of  ground  truth  objects  in  the  scene  that  must 
be  manually  selected.  Conversely,  in  radiance-based  target  detection,  unprocessed 
pixels  are  searched  to  locate  the  target  signature  as  it  would  appear  under  the  given 
illumination  conditions.  The  detection  rule  adjusts  to  the  data  instead  of  the  data 
adjusting  to  the  detection  rule. 

The  second  requirement,  illumination-invariance,  means  that  the  dismount  detec¬ 
tion  method  should  be  robust  in  different  environments.  Different  atmospheres  (clear, 
cloudy,  dry,  humid,  urban  or  rural)  combined  with  various  sun  angles  (high  in  the 
sky,  low  on  the  horizon,  behind  the  camera,  in  front  of  the  camera,  or  to  the  side) 
are  some  of  the  various  environments  that  are  encountered  in  dismount  detection. 
Illumination-invariance  means  that  the  dismount  detector  should  function  accurately 
with  no  prior  knowledge  of  these  conditions.  Instead  of  adjusting  the  detection  rule 
to  fit  the  environment,  unprocessed  pixels  are  subjected  to  the  same  detection  rule 
with  minimal  impact  on  detection  accuracy.  An  illumination-invariant,  radiance- 
based  detection  method  is  developed  in  this  thesis.  This  method  will  enable  real-time 
hyperspectral  dismount  detection. 
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3.3  Dismount  Detection  Model 


The  pixel  model  in  Section  2.2  can  be  tailored  to  the  dismount  detection  problem 
by  incorporating  the  assumptions  detailed  previously.  Extending  the  pixel  formula 
from  Eq.  (8)  to  a  vector  form  yields: 

x  =  aClsRR,  (22) 

where  the  vector  1srr=  [T(Ai),  . . . ,  L( Aat)]t  is  the  SRR  sampled  at  the  center  wave¬ 
lengths  of  the  channel  filters,  the  entries  of  the  diagonal  transform  matrix  C  compen¬ 
sate  for  the  width  of  each  channel  filter  specific  to  the  HSI  camera,  and  the  constant 
term  a  accounts  for  the  exposure  time  and  gain  factor  set  by  the  camera.  The  gain 
factor  (a)  is  set  to  minimize  the  overall  signal-to-quantization-noise  (SNR5)  power 
while  preventing  pixel  saturation. 

The  product  C1SRR  is  an  approximation  of  the  integral  in  Eq.  (7)  and  is  appro¬ 
priate  for  cameras  with  narrow  channel  bandwidths.  A  spectral  filter  with  a  narrow 
bandwidth  can  be  approximated  by  a  weighted  delta  function,  which  reduces  an  in¬ 
tegral  to  a  product  by  the  sifting  property. 

Writing  the  SRR  vector  Isrr  in  terms  of  the  dismount  detection  scenario  from 
Eq.  (21),  with  no  path  radiance  term,  gives: 

Isrr  =  LRr,  (23) 

which  has  a  form  similar  to  Eq.  (22)  in  that  the  reflectance  vector  r  is  the  object’s 
spectral  reflectance,  and  the  entries  of  the  diagonal  transform  matrix  LR  are  the 
SRR,  both  sampled  at  the  center  wavelengths  of  the  channel  filters.  The  reflectance 
vector  r  in  Eq.  (23)  is  different  than  the  estimated  reflectance  vector  r'  in  Eq.  (12) 
because  it  represents  the  sampled  inherent  spectral  reflectance  of  the  object  and  not 
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the  estimated  reflectance  vector  derived  from  ELM  correction.  The  reflected  radiance 
term  Lr  is  detailed  in  Eq.  (21).  In  matrix  form,  it  is  a  matrix  transformation  operator 
that  transforms  a  reflectance  vector  into  a  SRR  vector. 

Equation  (23)  can  be  substituted  for  the  SRR  vector  Isrr  in  Eq.  (22)  to  produce 

x  =  aCLRr,  (24) 

which  is  simplified  to 

x  =  TV.  (25) 

In  this  case,  T=  cjCLr  can  be  described  as  a  transformation  matrix  that  transforms 
the  TV-dimensional  reflectance  vector  r  into  a  pixel  x.  It  is  a  diagonal  matrix,  and 
its  diagonal  entries  are  the  point-wise  products  of  SRR  Lr,  channel  response  C,  and 
gain  factor  a.  The  entries  of  the  diagonal  matrix  T  are: 

Tu  =  aCaLRiit  (26) 

where  Tiif  is  the  ith  diagonal  entry  of  T,  a  is  the  camera  gain  factor,  C%1  is  the  channel 
width  factor  of  the  ith  channel,  and  Lr,u  is  the  SRR  intensity  at  the  center  wavelength 
of  the  ith  channel. 

Another  representation  of  the  diagonal  entries  of  T  is  the  SRR  entering  each 
channel  from  an  object  with  a  constant,  unity  spectral  reflectance  at  all  wavelengths. 
From  Eq.  (7),  this  can  be  written  as: 

Tu  —  a  J  L(X)Ci(X)d\  (27) 

where  where  Tu  is  the  ith  diagonal  entry  of  the  illumination  transform  T.  a  is  the 
camera  gain  factor  (related  to  exposure  time  texp  in  Eq.  (7)),  L( A)  is  the  SRR  as  a 
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function  of  wavelength  A,  and  Cj( A)  is  the  spectral  response  of  the  ith  channel  as  a 
function  of  wavelength  A. 

Expressing  the  relationship  between  spectral  reflectance  and  pixel  vectors  as  a 
matrix  transform  further  justifies  the  preference  for  radiance-based  detection  over 
reflectance-based  detection.  To  estimate  the  spectral  reflectance  of  each  pixel  using 
ELM  correction,  the  inverse  of  T  must  be  applied  to  each  pixel: 

r'  =  T“1x,  (28) 

where  r'  is  the  estimated  reflectance  vector  of  the  object  at  pixel  x.  However,  the 
inverse  of  T  may  not  always  exist.  HSI  cameras  with  channels  that  collect  light  at 
wavelengths  where  the  atmospheric  transmission  r( A)  is  low  will  have  zero  entries 
along  the  diagonal  of  T.  These  zero  entries  cause  the  determinant  of  T  to  be  zero, 
making  it  singular  and  non-invertiblc.  There  will  be  errors  when  comparing  the 
estimated  reflectance  vectors  to  known  target  spectral  reflectances  because  of  the 
lack  of  illumination  at  some  wavelengths.  Instead,  detection  can  be  performed  on  the 
unprocessed  pixels  x  if  the  target  spectral  reflectance  undergoes  the  same  illumination 
transform  T  as  the  pixels  in  the  image  to  create  a  target  signature  s. 

3.3.1  Target  Signature  Generation 

The  challenge  in  radiance-based  detection  is  choosing  an  appropriate  target  sig¬ 
nature  for  the  unknown  illumination  conditions.  For  a  target  with  a  known  spectral 
reflectance,  an  estimated  target  signature  can  be  generated  by  applying  Eq.  (25)  with 
an  estimate  of  the  illumination  transform  T: 

Tr 
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where  s  is  the  normalized  target  signature,  T  is  the  normalized  illumination  trans¬ 
form,  and  r  is  the  target  reflectance  vector.  A  target  reflectance  vector  is  derived  by 
sampling  the  measured  target  spectral  reflectance  at  the  center  wavelengths  of  the 
channels  of  the  hyperspectral  sensor.  The  normalized  illumination  transform  T  is  a 
diagonal  matrix  that  is  normalized  such  that  its  diagonal  entries  sum  to  one: 

N 

Tn  =  1.  (30) 

i= 1 

3.3.2  Linear  Discriminant  Detection 

The  LDA  detector  in  Eq.  (14)  can  be  adapted  for  unknown  illumination.  Assuming 
a  normalized  target  signature  (s)  from  Eq.  (29),  the  normalized  LDA  detector  is: 

^lda(x)  =  wTx,  (31) 

where  g?l£m(x)  is  the  score  given  to  a  pixel  (x)  by  the  LDA  detector,  w  is  the  linear 
discriminant  vector,  and  x  is  a  normalized  pixel.  The  linear  discriminant  vector  w 
can  be  computed  from: 

w  =  S^1(s  -Afc)  (32) 

where  E&  is  the  covariance  and  fib  is  the  mean  of  the  set  of  all  normalized  pixel  vectors 
in  an  image,  and  s  is  the  normalized  target  signature.  Equation  (32)  is  an  adaptation 
of  Eq.  15  using  the  target  signature  (s)  instead  of  the  target  class  mean  (At),  and 
ignoring  the  target  class  covariance. 

Because  it  incorporates  the  statistics  of  the  image,  this  detector  adapts  to  each 
scene.  The  background  statistics  are  derived  from  every  pixel  in  the  image  under  the 
assumption  that  a  target  occupies  only  a  small  fraction  of  the  pixels  and  does  not 
significantly  bias  the  statistics.  A  similar  assumption  underlies  the  anomaly  detector 


3-11 


algorithm  used  by  the  CAP  ARCHER  sensor  [42,  51]. 

As  discussed  in  [51],  the  background  covariance  (S5)  may  not  be  invertible  when 
using  full-spectrum  pixels,  because  hyperspectral  data  has  a  low  inherent  dimension¬ 
ality.  Hyperspectral  pixels  consist  of  hundreds  of  distinct  spectral  bands,  but  the 
responses  from  many  of  the  bands  are  highly  correlated.  In  order  to  have  an  invert¬ 
ible  covariance  matrix,  the  data  must  be  projected  onto  a  lower  dimensional  subspace 
using  a  kernel  (e.g.  principle  component  analysis  (PCA)). 

PCA  derives  orthogonal  dimensions  for  the  normalized  hyperspectral  pixels  by 
performing  eigen  decomposition  on  the  covariance  matrix  Ej,  [35].  Principal  com¬ 
ponents  are  the  eigenvectors  ranked  in  descending  order  by  their  eigenvalues.  If  the 
columns  of  the  N  x  K  projection  matrix  P  are  the  first  K  principal  components,  with 
N  >  K,  then  the  matrix  operation, 


y  =  PTx  (33) 

represents  a  iV  x  1  normalized  pixel  (x)  as  a  K  x  1  vector  (y)  in  an  orthogonal 
subspace.  When  the  number  of  principle  components  retained  ( K )  is  less  than  the 
rank  of  the  covariance  matrix  (E&),  the  covariance  of  the  normalized  pixels  projected 
into  this  subspace  will  be  invertible.  This  new  covariance  (Ey)  is  a  diagonal  matrix 
with  non-zero  entries  on  its  main  diagonal. 

Rewriting  the  LDA  equation  in  the  PCA  subspace  gives: 

diDA(x)  =  (s  -  /4)TPEy_1PTx,  (34) 

where  cIlda (x)  is  the  score  given  to  the  pixel  x,  s  is  the  normalized  target  signature, 
p,b  is  the  background  mean,  P  is  the  PCA  projection  operator  from  N  dimensions 
to  K  orthogonal  dimensions,  Sy  is  the  background  covariance  matrix  in  the  PCA 
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subspace,  and  x  is  the  normalized  pixel. 


3.3.3  Adaptive  Coherence  Estimator  Detection 

Like  the  LDA  detector  in  Eq.  (34),  the  ACE  detector  from  Section  2.5.3  also 
requires  an  invertible  covariance  matrix.  PCA  will  be  used  to  project  the  normalized 
pixel  data  into  an  orthogonal  subspace  with  an  invertible  covariance  matrix  for  ACE 
detection.  In  the  PCA  subspace,  the  ACE  detector  from  Eq.  (18)  is: 

7  (  ,  _  iS^PEy-^Xl2  /oK^ 

ACe  )  ~  (s'rPEy“1P:rs)  (xTPSy_1PTx)  ’  (35) 

where  dAce(x)  is  the  score  given  to  the  pixel  x,  s  is  the  normalized  target  signature 
computed  by  Eq.  (29),  the  matrix  P  is  the  PCA  projection  of  the  top  K  principal 
components,  Ey  is  the  background  covariance  matrix  in  the  PCA  subspace,  and  x  is 
the  normalized  pixel. 

Section  3.2  calls  for  a  radiance-based  illumination-invariant  dismount  detection 
method.  Using  any  one  of  the  previously  discussed  detectors,  MF,  LDA,  or  ACE, 
radiance-based  detection  can  be  performed  by  searching  a  raw  image  for  an  estimated 
target  signature.  To  achieve  illumination-invariance,  an  estimate  of  the  normalized 
illumination  transform  (T)  must  be  found  that  allows  these  detectors  to  perform  well 
under  a  wide  range  of  illumination  conditions.  This  will  enable  dismount  detection 
with  no  knowledge  of  the  illumination  conditions. 

3.4  Simulation  Setup 

The  normalized  illumination  transform  (T)  used  in  Eq.  (29)  is  a  function  of  the 
sensor  channel  responses  (C)  and  the  SRR  (Isrr)  reflected  from  a  theoretical  object 
with  a  unity  reflectance  vector  (r  =  1).  Specific  transforms  can  be  generated  for 
a  given  HSI  camera  in  a  particular  scenario  using  a  radiative  transfer  (RT)  code 
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to  simulate  the  SRR  in  Eq.  (21).  The  SRR  from  the  theoretical  “white”  object  is 
integrated  through  the  channel  filters  of  the  sensor  as  in  Eq.  (27)  and  normalized 
to  produce  a  specific  illumination  transform  (T).  The  illumination  transforms  for 
many  scenarios  can  be  collected  by  varying  the  radiative  transfer  code  over  a  range 
of  different  input  conditions.  From  this  collection,  an  estimate  of  the  illumination 
transform  can  be  derived  to  use  in  Eq.  (29). 

3.4.1  MODTRAN®5 

The  RT  code  MODTRAN®  is  used  to  simulate  the  SRR  in  Eq.  (21).  MODTRAN®  is 
a  collaborative  development  between  Spectral  Sciences,  Inc.,  and  the  Air  Force  Re¬ 
search  Lab  Space  Vehicles  Directorate.  It  is  an  atmospheric  simulation  tool  used  in 
the  remote  sensing  community  to  account  for  the  effects  of  the  Earth’s  atmosphere  on 
imaging  spectroscopy  measurements.  Many  researchers  use  MODTRAN®  to  demon¬ 
strate  their  spectral  detection  algorithms  on  simulated  radiance  data  [18]. 

MODTRAN®5,  the  most  recent  release,  is  used  for  these  simulations.  The  wide 
range  of  input  parameters  for  MODTRAN®5  gives  the  user  precise  control  over  many 
different  aspects  of  the  simulation,  such  as  the  sensor  response,  target  properties, 
scene  geometry,  and  atmospheric  composition. 

3.4.2  HST3  Sensor  Response 

The  sensor  to  be  simulated  in  MODTRAN®5  is  the  HST3  hyperspectral  camera 
from  HyperSpecTIR  [23].  The  HST3  pixels  capture  227  spectral  channels  between 
0.45/mi  and  2.45/im,  with  narrow  channel  bandwidths  of  12nm  at  VIS  wavelengths 
and  8.2nm  at  SWIR  wavelengths.  Narrow  channel  bandwidths  are  necessary  to  as¬ 
sume  the  illumination  transform  model  discussed  in  Section  3.3. 
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3.4.3  Scene  Geometry 


Each  M0DTRAN®5  run  is  simulated  using  a  theoretical  “white”  panel  as  the 
target  of  interest.  The  target  has  a  unity  spectral  reflectance  across  VIS-SWIR  wave¬ 
lengths.  The  MODTRAN®5  runs  are  performed  over  various  scene  geometries  and 
atmospheric  conditions.  In  each  simulation,  both  the  sensor  and  the  target  are  at  sea 
level  altitude.  The  sensor  is  geographically  located  at  45°N  latitude,  aimed  towards 
the  target  100  meters  away  to  the  north.  The  zenith  angle  of  the  sun  from  nadir 
varies  between  5,  15,  30,  45,  60,  and  75  degrees  to  simulate  different  times  of  day. 
The  angle  between  the  sun  and  the  surface  normal  of  the  target,  o’  in  Eq.  (4),  is  var¬ 
ied  between  0,  45,  and  90  degrees,  such  that  cos  ex'  becomes  1,  and  0,  respectively. 
These  different  angles  simulate  the  target  being  exposed  to  direct  sunlight,  indirect 
sunlight,  and  shaded  from  sunlight.  The  form  factor  (F)  in  Eq.  (4)  is  held  constant 
at  1,  under  assumption  that  the  area  of  visible  sky  above  the  target  remains  constant. 

3.4.4  Atmospheric  Parameters 

Although  MODTRAN®5  enables  fine-tuning  of  many  atmospheric  parameters, 
it  also  includes  several  preset  atmospheres  that  replicate  common  conditions.  Some 
of  the  preset  atmospheres  are  used  in  these  simulations.  The  MODEL  parameter, 
which  creates  various  preset  geographical  and  seasonal  model  atmospheres,  will  take 
on  values  for  a  tropical  atmosphere  at  15°N  latitude,  mid-latitude  summer  at  45°N 
latitude,  mid-latitude  winter  at  45°N  latitude,  and  1976  US  standard  atmosphere. 
Also,  the  IHAZE  parameter,  determining  the  type  of  aerosol  extinction  model,  will 
be  changed  between  rural,  maritime,  urban,  and  desert.  The  VIS  parameter,  which 
scales  the  aerosol  extinction  content  to  create  a  given  meteorological  visibility  range 
in  kilometers,  will  be  set  to  4km,  8km,  12km,  16km,  20km,  and  24km. 

The  different  MODTRAN®5  parameters  and  their  ranges  of  values  for  these  sim- 
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Table  3.  Names  and  ranges  of  MODTRAN®5  parameters  varied  in  these  simulations. 


Parameter 

Values 

Sensor 

HST3  camera 

Target 

unity  reflectance 

Sensor  altitude 

sea  level 

Target  altitude 

sea  level 

Range 

100  meters 

Sun  zenith  angle 

5°,  15°,  30°,  45°,  60°,  75° 

Exposure  angle  o' 

0°,  45°,  90° 

Atmosphere  model 

tropical,  mid-lat  summer,  mid-lat  winter,  1976  US  std. 

Aerosol  extinction 

rural,  maritime,  urban,  desert 

Visibility 

4km,  8km,  12km,  16km,  20km,  24km 

ulations  are  shown  in  Table  3. 


3.5  Estimated  Illumination  Transform 

For  each  combination  of  parameters  in  Table  3,  the  M0DTRAN®5  simulation 
produces  a  normalized  illumination  transform  (T)  that  describes  the  relationship 
between  a  reflectance  vector  (r)  and  a  pixel  vector  (x)  for  the  HST3  camera  in  a 
specific  scenario.  The  mean  normalized  transform  (T/t)  is  the  element-wise  average 
of  the  collection  of  M  simulated  normalized  transforms  T ni  G 

1  M 

t  =  (36) 

i= 1 

This  normalized  transform  is  used  in  Eq.  (29)  to  create  an  estimated  target  radiance 
signature  for  the  MF,  LDA,  and  ACE  detectors. 

Choosing  a  single  normalized  transform  to  use  in  all  scenarios  has  two  benefits. 
First,  a  scene  does  not  need  to  contain  a  set  of  ground  truth  objects  to  directly 
measure  the  illumination.  Eliminating  this  requirement  increases  the  usability  of  the 
dismount  detection  camera.  Second,  the  target  signature  remains  the  same  in  all 
cases,  not  needing  to  be  updated  to  the  illumination  conditions  in  each  scene.  This 
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improvement  would  reduce  computing  time  and  increase  detection  speed. 

3.6  Summary 

Radiance-based  spectral  detection  can  perform  as  well  as  reflectance-based  detec¬ 
tion  if  the  illumination  conditions  are  taken  into  account.  Rather  than  converting 
HSI  pixels  into  estimated  reflectance  vectors,  which  lengthens  the  algorithm  chain,  a 
radiance-based  detector  can  operate  directly  on  the  unprocessed  pixels  data  with  an 
estimate  of  the  illumination  transform. 

Dismount  detection  allows  certain  assumptions  to  be  made  that  simplify  the  SRR 
model  from  remote  sensing.  A  pixel  vector  is  assumed  to  be  the  product  of  scene 
illumination  and  an  object’s  spectral  reflectance.  Normalizing  the  pixel  vectors  in 
an  image  reduces  the  variability  caused  by  different  surface  orientations  of  objects 
in  the  scene.  A  normalized  target  signature  for  human  skin  is  generated  from  an 
illumination  transform  and  a  measurement  of  skin’s  spectral  reflectance.  Using  that 
signature,  the  MF,  LDA,  and  ACE  detectors  can  search  an  image  for  skin  pixels. 

Instead  of  measuring  the  illumination  transform  with  calibration  panels,  which 
lengthens  the  skin  detection  algorithm  chain,  it  is  estimated  from  a  series  of  MOD- 
TRAN  simulations.  The  mean  of  a  large  dataset  of  MODTRAN  simulations  for  the 
illumination  transform  in  various  scenes  provides  a  common  illumination  estimate.  In 
the  next  chapter,  the  MF,  LDA,  and  ACE  detectors  will  be  applied  to  hyperspectral 
images  using  a  skin  signature  derived  from  the  estimated  illumination  transform.  The 
results  will  demonstrate  the  ability  to  perform  dismount  detection  without  measuring 
scene  illumination. 
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4.  Results  and  Analysis 


This  chapter  presents  the  results  of  the  illumination-invariant  dismount  detect 
strategy  outlined  in  Chapter  3.  First,  the  MODTRAN  simulation  results  for  the  nor¬ 
malized  illumination  transform  are  presented  in  Section  4.1.  Then  the  hyperspectral 
data  collection  process  is  described  in  Section  4.2,  including  the  scene  setup  and  the 
measured  illumination  conditions.  A  dismount  is  shown  in  a  series  of  hyperspectral 
images,  and  the  spectral  reflectance  of  the  exposed  skin  of  the  dismount  is  measured. 
In  Section  4.3,  the  estimated  illumination  transform  is  compared  to  the  measured 
illumination  transform  in  each  scene  as  measured  from  a  white  Spectralon  calibration 
panel.  The  procedure  for  generating  a  normalized  target  signature  for  human  skin 
is  illustrated,  and  this  skin  signature  is  compared  against  the  actual  observed  skin 
pixels  from  each  image  in  Section  4.4. 

The  reflectance-based  skin  detection  method  from  [37]  is  used  as  a  baseline  for 
comparing  the  performance  of  the  three  illumination-invariant  methods  from  Sec¬ 
tion  3.3.  Section  4.5  describes  the  pixel  scoring  procedure  for  each  method.  Finally, 
the  detection  results  for  all  four  methods  are  discussed  in  Section  4.6.  Skin  detection 
images  are  qualitatively  compared  based  on  how  well  they  display  the  outline  of  the 
dismount  while  suppressing  false-alarms.  Then  the  detection  methods  are  quantita¬ 
tively  compared  with  a  series  of  ROC  curves. 

4.1  MODTRAN  Simulations 

The  MODTRAN  simulations  in  Section  3.4  produce  1728  normalized  illumination 
transforms  for  the  range  of  different  scenarios.  Each  transform  represents  the  response 
from  each  spectral  channel  of  the  HST3  camera  to  an  object  with  a  unity,  Lambertian 
spectral  reflectance  being  illuminated  by  natural  daylight  under  specific  atmospheric 
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Figure  9.  Left:  1728  different  normalized  illumination  transforms  from  the  simulations 
described  in  Section  3.4.  Right:  Mean  normalized  illumination  transform  derived  using 
Eq.  (36). 

conditions.  Section  3.3  explains  how  a  normalized  illumination  transform  is  multi¬ 
plied  by  a  reflectance  vector  to  predict  the  normalized  signature  of  a  target  with  a 
known  spectral  reflectance.  To  create  a  general-purpose  illumination  transform  when 
the  illumination  conditions  are  unknown,  the  1728  simulation  results  are  applied  to 
Eq.  (36)  in  Section  3.5,  creating  an  estimate  of  the  normalized  illumination  transform 
T. 

All  1728  individual  simulation  results  are  plotted  in  Fig.  9.  Although  the  condi¬ 
tions  were  unique  for  each  simulation,  they  all  produced  similar  illumination  trans¬ 
forms.  The  illumination  transforms  have  the  highest  magnitude  in  the  VIS  portion  of 
the  spectrum,  from  450nm  to  750nm,  and  taper  off  in  the  NIR  and  SWIR  portions  of 
the  spectrum,  from  750nm  to  2500nm,  similar  to  the  solar  irradiance  distribution  plot 
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in  Fig.  2.  HST3  channels  in  the  VIS  portion  of  the  spectrum  have  a  wider  FWHM 
(12nm)  than  those  in  the  rest  of  the  spectrum  (8.2nm),  allowing  more  light  energy 
to  pass  through  those  channels.  That  is  why  the  VIS  channels  are  higher  than  the 
NIR-SWIR  channels  in  Fig.  9  compared  to  the  solar  irradiance  curve  in  Fig.  2. 

The  widths  of  the  HST3  channels  also  lend  to  the  consistent  shape  of  the  illu¬ 
mination  transforms  throughout  different  atmospheric  conditions.  Slight  changes  in 
the  levels  of  certain  atmospheric  aerosols  can  significantly  effect  atmospheric  trans¬ 
mission  (ti(A))  at  the  sub-nanometer  scale.  MODTRAN  can  simulate  these  effects 
at  high  spectral  resolution.  Hyperspectral  cameras,  however,  are  limited  in  spectral 
resolution  by  the  FWHM  of  their  channel  filters,  which  are  on  the  order  of  lOnm 
wide.  HST3  channels  are  too  wide  to  capture  the  fine  spectral  resolution  where  slight 
changes  in  atmospheric  aerosols  appear.  As  the  atmospheric  content  changes,  the  il¬ 
lumination  shape  has  little  variance  about  its  mean,  making  it  acceptable  to  estimate 
the  illumination  transform  by  taking  the  mean  of  all  simulation  results. 

The  right  side  of  Fig.  9  shows  diagonal  entries  of  the  mean  normalized  illumina¬ 
tion  transform  calculated  from  Eq.  (36).  Atmospheric  attenuation  bands,  mentioned 
in  Section  2.2,  are  noticable  around  1350nm-1430nm  and  1800nm-1950nm,  just  as  in 
Fig.  3.  A  relatively  low  amount  of  electromagnetic  energy  reaches  through  the  atmo¬ 
sphere  to  the  surface  of  the  Earth  at  these  wavelengths.  This  creates  a  problem  when 
converting  pixels  into  reflectance  vectors  using  ELM  as  described  in  Section  2.4.2. 
Since  the  pixel  values  of  the  white  and  gray  calibration  panels  are  nearly  zero  at  the 
channels  of  the  atmospheric  attenuation  bands,  the  solution  to  Eq.  (11),  finding  the 
ELM  conversion  constants  for  those  channels,  has  a  large  error.  Estimated  reflectance 
vectors  using  those  conversion  constants  would  be  significantly  inaccurate  in  the  at¬ 
mospheric  attenuation  bands.  To  empirically  estimate  an  object’s  spectral  reflectance 
at  a  given  band,  there  must  be  sufficient  illumination  energy  at  that  band.  Natural 
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illumination,  clue  to  the  atmospheric  attenuation  bands,  is  not  ideal  for  full-spectrum 
reflectance  estimation.  This  justifies  the  need  for  radiance-based  dismount  detection 
when  relying  on  natural  illumination. 

4.2  Data  Collection 

A  dataset  of  hyperspectral  images  is  collected  with  the  HST3  camera  in  Dayton, 
OH,  on  6  March  2009.  The  intent  of  the  data  collect  is  to  capture  the  same  dismount 
detection  scene  multiple  times  under  different  illumination  conditions.  This  is  accom¬ 
plished  by  taking  images  at  different  times  of  the  day  as  the  sun  is  partially  blocked 
by  passing  clouds. 

The  hyperspectral  dataset  consists  of  four  sequences,  with  12  images  in  each  se¬ 
quence,  shown  in  Fig.  10.  The  first  sequence  is  taken  at  0945,  the  second  at  1000, 
the  third  at  1100,  and  the  fourth  at  1430.  Each  sequence  captured  the  scene  illumi¬ 
nated  by  the  sun  at  a  different  zenith  angle.  The  weather  is  partly  cloudy  on  the 
date  of  the  data  collect,  with  mostly  clear  skies  in  the  morning  and  overcast  skies  in 
the  afternoon.  During  the  first  two  sequences  at  0945  and  1000,  the  sun  is  partially 
blocked  by  passing  clouds,  producing  a  different  illumination  for  each  image  in  each 
sequence.  In  the  third  sequence,  at  1100,  the  sky  is  mostly  clear  and  the  sun  is  high 
in  the  sky,  so  the  scene  illumination  is  composed  of  direct  sunlight.  By  the  time  the 
fourth  sequence  is  collected  at  1430,  the  sky  is  overcast.  Illumination  in  the  fourth 
sequence  is  dimmer  and  more  indirect  than  in  the  previous  sequences. 

The  scene  configuration  is  similar  to  the  conditions  in  the  MODTRAN  simulations 
described  in  Section  3.4.  The  HST3  camera  is  set  up  at  ground  level,  with  the  aperture 
facing  northeast.  The  setting  is  a  suburban  area  that  includes  houses,  trees,  and  a 
road.  Two  Spectralon  calibration  panels  are  placed  on  the  ground  eight  meters  in 
front  of  the  HST3  camera.  The  panels  stand  upright,  with  the  surface  normal  of  each 
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Figure  10.  Hyperspectral  data  collected  for  this  study,  converted  to  RGB  for  display. 
The  first  column  is  the  sequence  collected  at  0945,  the  second  at  1000,  the  third  at 
1100,  and  the  fourth  at  1430. 
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panel  pointing  towards  the  camera  to  expose  as  much  surface  area  as  possible.  These 
panels  are  included  in  the  scene  to  provide  a  spectral  reflectance  reference  as  required 
in  Section  2.4.2. 

The  images  from  each  sequence  contain  a  dismount  with  arms  extended  out  to 
the  side.  The  dismount,  Caucasian  male  with  dark  brown  hair,  is  wearing  a  black, 
cotton,  short-sleeved  t-shirt  and  blue,  denim  jeans.  Skin  is  exposed  on  the  face  and 
arms.  Starting  from  a  point  50  meters  away  from  the  camera  in  the  first  image,  the 
dismount  walks  towards  the  camera.  In  each  image,  the  dismount  progresses  closer  to 
the  camera,  finally  stopping  eight  meters  in  front  of  the  camera  in  the  twelfth  image. 

After  the  images  are  collected,  skin  pixels  are  selected  by  hand  to  create  truth 
masks.  Also,  the  pixels  of  both  of  the  Spectralon  calibration  panels  are  selected  for 
their  own  truth  masks.  The  spectral  reflectance  of  the  exposed  skin  on  the  forearms 
and  face  of  the  dismount  is  measured  100  times  with  an  ASD  contact  probe.  These 
measurements  are  sampled  at  the  center  wavelengths  of  the  HST3  channel  filters  as 
described  in  Section  3.3,  and  averaged  to  create  a  skin  reflectance  vector.  The  skin 
reflectance  measurements  and  the  resulting  mean  skin  reflectance  vector  are  shown 
in  Fig.  11. 

4.3  Illumination  Comparison 

The  MODTR  AN  simulation  results  in  Section  4.1  display  the  response  of  the  HST3 
camera  to  the  radiance  from  an  object  with  a  constant,  unity  spectral  reflectance 
under  simulated  conditions.  A  similar  process  is  used  to  measure  the  normalized 
illumination  transforms  from  each  image  in  Fig.  10.  The  white  Spectralon  calibration 
panel  has  a  spectral  reflectance  that  is  nearly  one  at  wavelengths  between  400nm 
and  2500nm.  Its  spectral  reflectance,  as  collected  by  the  ASD  held  spectrometer,  is 
shown  in  Fig.  12.  After  dividing  by  its  known  spectral  reflectance,  pixels  of  the  white 
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Figure  11.  Left:  100  skin  reflectance  measurements  taken  with  an  ASD  field  spec¬ 
trometer  of  the  exposed  skin  on  the  forearms  and  face  of  the  dismount  shown  in  the 
HST3  images  in  Fig.  10.  Right:  Mean  skin  reflectance  vector  used  to  generate  a  target 
signature. 


Wavelength  (nm) 


Figure  12.  Spectral  reflectance  of  the  white  Spectralon  calibration  panel  used  to  mea¬ 
sure  scene  illumination  in  each  HST3  image.  Since  its  spectral  reflectance  is  very  close 
to  one  for  wavelengths  between  450nm  and  2500nm,  the  white  Spectralon  panel  makes 
a  good  reference  object  for  measuring  scene  illumination. 
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Figure  13.  Normalized  illumination  transforms  measured  from  the  white  Spectralon 
calibration  panel  in  each  sequence.  All  twelve  measurements  for  each  sequence  are 
plotted  as  the  colored  lines.  The  dashed  line  is  the  estimated  illumination  transform 
in  Section  4.1  obtained  from  MODTRAN  simulations. 


panel  estimate  the  scene  illumination  transform  in  the  same  way  as  the  MODTRAN 
simulations. 

The  normalized  illumination  transform  for  each  image  is  measured  by  averaging  all 
of  the  pixels  from  the  white  Spectralon  panel,  dividing  by  the  white  panel’s  spectral 
reflectance,  and  normalizing  the  result.  These  measurements  are  shown  in  Fig.  13. 
During  sequences  1  and  2,  the  sun  was  partially  blocked  by  passing  clouds,  causing 
the  measured  illumination  to  change  between  images.  This  appears  in  Fig.  13  as 
the  distribution  of  colored  curves.  Sequences  3  and  4  had  more  consistent  illumi- 
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Figure  14.  Illumination  error  given  by  the  Euclidean  distance  between  the  estimated 
illumination  and  measured  illumination  in  each  frame.  Illumination  stayed  consistently 
sunny  in  sequence  3  and  cloudy  in  sequence  4.  Passing  clouds  blocked  the  sunlight  in 
frames  3,4,9-12  of  sequence  1,  and  frames  9-12  of  sequence  2. 


nation  between  images.  For  comparison,  the  estimated  illumination  transform  from 
Section  4.1  is  also  plotted  in  Fig.  13  as  a  black  dashed  line. 

The  error  between  the  estimated  illumination  transform  used  to  generate  the 
target  signature  in  Eq.  (29)  and  the  measured  illumination  in  each  image  is  plotted 
in  Fig.  14.  The  error  is  computed  as  the  Euclidean  distance  between  the  diagonal 
entries  of  both  transforms: 


Illumination  Error 


(37) 


where  Ta,u  is  the  ith  diagonal  entry  of  the  estimated  illumination  transform  and  Tb,u 
is  the  ith  diagonal  entry  of  the  measured  illumination  transform. 

The  estimated  illumination  transform  most  closely  matches  the  measured  illumi¬ 
nation  transform  in  sequence  4,  when  the  sky  is  overcast.  In  sequences  1  and  2,  the 
illumination  error  drops  as  the  sunlight  is  partially  blocked  by  passing  clouds  between 
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Figure  15.  Generating  a  normalized  skin  signature  using  Eq.  (29).  The  skin  reflectance 
vector  (r,  left)  of  the  dismount  is  transformed  by  the  estimated  illumination  transform 
(T,  middle)  into  an  estimated  skin  signature  (s,  right). 


frames. 

4.4  Skin  Signature 

In  Section  2.5,  the  Matched  Filter  (MF),  Linear  Discriminant  Analysis  (LDA),  and 
Adaptive  Coherence  Estimator  (ACE)  spectral  detection  algorithms  were  introduced. 
These  algorithms  search  hyperspectral  pixels  for  a  normalized  target  signature  (s), 
assigning  scores  to  pixels  based  on  how  similar  they  are  to  the  target  signature.  For  a 
target  with  a  known  reflectance  vector  (r),  its  normalized  signature  can  be  generated 
using  the  estimated  illumination  transform  in  Eq.  (29).  This  section  compares  the 
empirical  appearance  of  human  skin  pixels  in  the  HST3  images  shown  in  Fig.  10  to 
the  estimated  skin  signature  generated  by  Eq.  (29). 

The  diagram  in  Fig.  15  illustrates  the  concept  of  generating  a  skin  signature  with 
the  estimated  illumination  transform.  On  the  left  is  a  plot  of  the  average  spectral 
reflectance  of  skin  shown  in  Fig.  11.  In  the  middle  is  a  plot  of  the  estimated  illumina¬ 
tion  transform  shown  in  Fig.  9.  Both  of  these  are  known  a  priori ,  before  any  of  the 
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Figure  16.  Normalized  skin  pixels  hand-selected  from  each  sequence  of  images.  All 
twelve  mean  normalized  skin  pixels  are  plotted  for  each  sequence  as  the  colored  lines. 
The  dashed  line  is  the  estimated  skin  signature  (s)  generated  from  Eq.  (29)  using  the 
estimated  illumination  transform. 


images  in  Fig.  10  are  taken.  When  these  are  combined  in  Eq.  (29),  the  result  is  an  es¬ 
timated  skin  signature  (s)  shown  on  the  right  of  Fig.  15.  This  skin  signature  requires 
no  knowledge  of  the  atmospheric  or  illumination  conditions  in  the  scene  because  it  is 
generated  from  the  estimated  illumination  transform  in  Fig.  9. 

To  verify  that  s  is  an  acceptable  estimate  of  the  actual  normalized  skin  signature, 
skin  pixels  collected  by  the  HST3  imager  are  compared  to  this  estimate  in  Fig.  16. 
Each  colored  line  represents  the  average  of  all  normalized  skin  pixels  from  one  image. 
Twelve  colored  lines,  one  for  each  image,  are  plotted  for  each  sequence.  The  dashed 


4-11 


black  line  is  the  estimated  skin  signature  (s),  shown  for  reference.  In  all  four  se¬ 
quences,  the  estimated  and  actual  normalized  skin  pixels  are  very  similar,  especially 
at  VIS-NIR  wavelengths  from  400nm  to  900nm.  Between  900nm  and  1500nm,  s  has  a 
smaller  magnitude  than  do  the  actual  skin  pixels,  because  the  estimated  illumination 
transform  at  those  wavelengths  is  lower  than  what  is  measured.  At  SWIR  wavelengths 
from  1500nm  to  2500nm,  both  the  estimated  and  actual  skin  pixels  are  nearly  zero. 
Natural  illumination  is  weak  at  these  wavelengths  compared  to  VIS  wavelengths,  and 
so  is  the  spectral  reflectance  of  human  skin. 


4.5  Pixel  Scoring 


4.5.1  Reflectance-Based  Method 


The  NDSI  skin  detection  method  described  in  Section  2.6.2  requires  ELM  correc¬ 
tion  to  convert  pixels  into  estimated  reflectance  vectors.  ELM  correction  uses  a  linear 
transform  that  is  derived  by  observing  the  radiance  from  two  Spectralon  calibration 
panels  with  known  spectral  reflectances.  For  each  HST3  image,  the  pixels  of  the  white 
and  gray  Spectralon  panels  are  selected  by  hand  and  averaged.  These  average  pixels 
are  applied  to  Eq.  (11)  to  generate  the  ELM  transform  matrices  M  and  b  in  Eq.  (12). 
The  ELM  transform  is  applied  to  the  pixels  to  create  estimated  reflectance  vectors. 

Once  the  pixels  of  an  image  are  converted  into  estimated  reflectance  vectors,  they 
are  scored  by  the  NDSI  and  NDGRI  equations,  written  here: 


NSDI 


/(lOSOnm)  —  r/(1580nm) 
r^lOSOnm)  +  r,(1580nm)  ’ 


(38) 


NDGRI 


r'(660nm)  —  r/(540nm) 
r'(660nm)  +  r'(540nm)  ’ 


(39) 


where  r'( A)  is  the  value  of  the  estimated  reflectance  vector  (r')  at  the  spectral  channel 
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closest  to  A. 


When  using  the  NDSI  method  of  skin  detection,  independent  thresholds  must  be 
applied  to  both  the  NDSI  and  NDGRI  scores,  creating  two  degrees  of  freedom  for 
the  classifier.  This  makes  it  difficult  to  generate  a  ROC  curve,  which  is  a  function  of 
a  single  threshold.  To  compare  the  performance  of  the  NDSI  skin  detection  method 
with  the  other  algorithms  by  their  ROC  curves,  the  NDSI  and  NDGRI  values  for  each 
pixel  must  be  combined  into  a  single  score. 

In  [37],  the  NDSI  was  proposed  as  a  score  for  how  likely  a  pixel  is  skin.  It  was 
noted  that  green  vegetation  created  false  alarms  because  it  also  scored  highly  on  the 
NDSI.  The  NDGRI  was  introduced  for  false-alarm  suppression  because  vegetation  is 
more  green  than  red,  while  skin  is  more  red  than  green.  A  skin  reflectance  vector 
has  a  high  NDSI  and  a  negative  NDGRI.  From  this  context,  a  single  score  is  used  in 
this  thesis  that  incorporates  both  the  NDSI  and  NDGRI  values.  The  score,  denoted 
dNDSij  is  equal  to  the  NDSI  if  the  NDGRI  is  negative,  and  is  equal  the  NDSI  minus 
one  if  the  NDGRI  is  positive: 

{NDSI,  if  NDGRI  <  0 

>  (40) 

NDSI  -  1,  if  NDGRI  >  0 

where  NDSI  and  NDGRI  are  calculated  from  an  estimated  reflectance  vector  using 
Eq.  (38)  and  Eq.  (39). 

The  output  of  Eq.  (40)  ranges  between  —2  and  1.  Equation  (40)  is  a  fair  repre¬ 
sentation  of  the  NDSI  skin  detection  method  in  one  dimension  instead  of  two.  Skin 
pixels  are  given  high  scores  because  they  have  a  high  NDSI,  while  the  NDGRI  carries 
out  its  false  alarm  suppression  purpose  by  reducing  the  score  of  green  vegetation. 
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4.5.2  Illumination- Invariant  Methods 


The  three  illumination-invariant  detection  algorithms  applied  to  the  HST3  images 
are  the  Matched  Filter  detector  in  Eq.  (17),  the  Linear  Discriminant  Analysis  detector 
in  Eq.  (34),  and  the  Adaptive  Coherence  Estimation  detector  in  Eq.  (35).  The  MF 
detector  assigns  a  score  (< Imf )  to  each  pixel  according  to  its  correlation  with  the 
estimated  skin  signature  (s).  The  value  of  cImf  ranges  from  0  to  1. 

LInlikc  the  MF  detector,  the  LDA  and  ACE  detectors  require  an  invertible  back¬ 
ground  covariance  matrix  £&.  PCA  is  used  to  project  the  hyperspectral  data  onto  its 
K  principle  components,  making  the  background  covariance  matrix  invertible.  For 
this  thesis,  K  =  12  is  chosen  as  the  number  of  principle  components  to  retain.  This 
value  is  chosen  after  trial  and  error,  where  it  is  found  that  the  performance  of  K  =  12 
is  better  than  K  =  4  and  K  =  8,  but  similar  to  K  =  16. 

4.6  Detection  Results 

4.6.1  False  Alarm  Comparison 

Skin  detection  results  from  four  of  the  hyperspectral  images  are  compared  in 
Figs.  17,  18,  19,  and  20.  The  four  images  are  the  tenth  frames  from  each  of  the 
four  sequences.  These  four  images  are  compared  because  the  dismount  is  in  a  similar 
position  close  to  the  camera,  and  because  the  illumination  is  different  between  them. 
The  normalized  illumination  transforms  from  each  image,  measured  from  the  white 
Spectralon  panel,  are  shown  together  in  Fig.  21.  The  sky  is  partly  cloudy  in  the  first 
image,  mostly  clear  in  the  second  image,  cloudless  in  the  third  image,  and  overcast 
in  the  fourth  image.  Color  versions  of  the  images  are  in  the  top  left,  and  truth  masks 
of  hand-selected  skin  pixels  are  in  the  top  right  of  each  figure.  White  pixels  are  those 
that  the  detection  method  classifies  as  skin. 
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Sequence  1,  Frame  10 


NDSI:  P  =  0.0009  MF:  P  =  0.0007 

FA  FA 


LDA:  P  =  0.0034  ACE:  P  =0.0011 

FA  FA 


Figure  17.  Frame  10  from  the  first  sequence  of  images.  The  sky  is  partly  cloudy. 
Detection  results  are  thresholded  such  that  Pjy  =  0.8.  Illumination-invariant  methods 
(middle  right,  bottom  left,  bottom  right)  have  a  Pfa  as  low  as  the  reflectance-converted 
NDSI  method  (middle  left). 
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Sequence  2,  Frame  10  Truth 


Figure  18.  Frame  10  from  the  second  sequence  of  images.  The  sky  is  mostly  clear. 
Detection  results  have  been  thresholded  such  that  Pjj  =  0.8.  Illumination-invariant 
methods  (middle  right,  bottom  left,  bottom  right)  have  a  noticeably  higher  Pfa  than 
the  reflectance-converted  NDSI  method  (middle  left). 
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Sequence  3,  Frame  10  Truth 


ACE: 

PCA  =  0.0022 

FA 

1,4 

Figure  19.  Frame  10  from  the  third  sequence  of  images.  The  sky  is  cloudless.  De¬ 
tection  results  have  been  thresholded  such  that  PD  =  0.8.  Illumination-invariant  meth¬ 
ods  (middle  right,  bottom  left,  bottom  right)  have  a  noticeably  higher  Pfa  than  the 
reflectance-converted  NDSI  method  (middle  left). 
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Sequence  4,  Frame  10 


Truth 


LDA:  P  =  0.0035 

FA 


ACE:  P  =  0.0006 

FA 


Figure  20.  Frame  10  from  the  fourth  sequence  of  images.  The  sky  is  overcast.  Detec¬ 
tion  results  have  been  thresholded  such  that  Pjj  =  0.8.  Illumination-invariant  methods 
(middle  right,  bottom  left,  bottom  right)  have  a  much  lower  Pfa  than  the  reflectance- 
converted  NDSI  method  (middle  left). 
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Figure  21.  Normalized  illumination  measured  from  the  white  Spectralon  panel  for 
frame  10  in  all  four  sequences.  The  illumination  conditions  in  these  four  images  were 
unique,  as  indicated  by  the  non-overlapping  illumination  vectors  above. 


Skin  detection  images  for  the  NDSI,  MF,  LDA,  and  ACE  algorithms  are  created  by 
applying  a  threshold  to  scored  pixels  such  that  80%  of  the  hand-selected  skin  pixels  are 
correctly  classified.  At  a  constant  detection  rate  of  PD  =  0.8,  the  corresponding  false- 
alarm  rate  indicates  how  well  each  algorithm  suppresses  false  alarms  while  detecting 
skin. 

In  the  partially  cloudy  scene  (Fig.  17),  the  modified  NDSI  method  from  Eq.  (40) 
produces  a  false-alarm  rate  of  Pfa  =  0.0009.  The  illumination-invariant  algorithms 
have  similar  false-alarm  rates  on  the  same  image:  MF  has  Pfa  —  0.0007,  LDA  has 
Pfa  —  0.0034,  and  ACE  has  Pfa  —  0.0011.  The  outline  of  the  dismount  is  discernible 
in  all  four  detection  images. 

Results  on  the  mostly-clear  scene  (Fig.  18)  are  similar  to  those  in  the  partially 
cloudy  scene  (Fig.  17),  except  that  the  illumination-invariant  algorithms  produce 
more  false  alarms.  The  modified  NDSI  method  has  Pfa  =  0.0006,  while  MF  has 
Pfa  =  0.0134,  LDA  has  Pfa  =  0.0057,  and  ACE  has  Pfa  =  0.0059.  The  incorrectly 
detected  pixels  obscure  the  outline  of  the  dismount  in  the  MF  results.  A  pink  t-shirt 
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and  a  red  brick  are  incorrectly  classified  as  skin  in  the  LDA  and  ACE  results. 

The  modified  NDSI  method  continues  to  have  a  lower  false-alarm  rate  in  the 
cloudless  scene  (Fig.  19),  with  a  false-alarm  rate  of  Pfa  =  0.0005,  compared  to 
MF  with  Pfa  =  0.0094,  LDA  with  Pfa  =  0.0054,  and  ACE  with  Pfa  =  0.0022. 
However,  all  four  algorithms  have  a  lower  Pfa  than  in  the  mostly-clear  scene  (Fig.  18). 
This  could  be  due  to  an  artifact  from  the  hand-selected  skin  truth  images  possibly 
containing  mislabeled  pixels. 

Out  of  the  four  frames  displayed,  the  results  of  the  overcast  scene  (Fig.  20)  are 
the  most  dramatic.  In  this  scene,  the  modified  NDSI  method  performs  worse,  with 
Pfa  =  0.052,  than  the  illumination-invariant  methods:  MF  has  Pfa  =  0.005,  LDA 
has  Pfa  =  0.0035,  and  ACE  has  Pfa  =  0.0006.  The  false-alarms  in  the  NDSI  image 
obscure  the  shape  of  the  dismount  and  appear  spread  out  across  the  image.  There  is 
not  one  particular  object  that  caused  its  high  Pfa ,  like  the  t-shirt  and  brick  did  for 
the  LDA  and  ACE  methods  in  the  other  scenes. 

ROC  curves  displayed  in  Fig.  22  compare  the  performance  of  all  four  detection 
methods  on  the  images  in  Figs.  17,  18,  19,  and  20.  A  horizontal  line  drawn  through 
the  ROC  curves  at  Pp  =  0.8  represents  their  operating  points.  Although  a  detection 
rate  of  only  80%  is  somewhat  low  for  a  target  detection  algorithm,  it  is  used  in  this 
thesis  because  it  accomplishes  two  objectives  for  a  dismount  detection  system.  First, 
even  with  80%  of  the  skin  pixels  detected  the  shape  of  the  dismount  is  still  noticeable 
in  the  thresholded  images.  Second,  limiting  the  detection  rate  to  Pd  =  0.8  keeps  the 
false  alarm  rate  very  low.  In  a  skin-cued  dismount  detection  system  like  the  one  in 
[6],  a  low  false  alarm  rate  is  more  important  than  a  high  detection  rate,  because  false 
alarms  increase  the  search  time. 

Out  of  the  three  illumination-invariant  methods,  the  MF  performs  the  worst.  Its 
detection  rate  is  lower  than  the  LDA  and  ACE  methods  at  false-alarm  rates  above 
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Sequence  1,  Frame  10  Sequence  2,  Frame  10 


Sequence  3,  Frame  10 


Sequence  4,  Frame  10 


Figure  22.  ROC  curves  from  the  four  skin  detection  methods  compared  in 
Fig.  17,  18,  19,  and  20.  The  dotted  line  at  Po  =  0.8  represents  the  operating  point 
of  the  four  methods,  and  where  it  crosses  the  ROC  curves  is  the  corresponding  Pfa  for 
that  method. 
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Pfa  —  0.02,  and  it  only  outperforms  the  modified  NDSI  method  on  two  of  the  four 
images.  The  LDA  and  ACE  methods  have  the  highest  detection  rate  at  false-alarm 
rates  above  Pfa  =  0.02.  Below  Pfa  —  0.02,  the  ACE  method  outperforms  the  LDA 
method,  but  the  modified  NDSI  method  performs  better  than  both.  The  exception  is 
in  the  overcast  scene  (Fig.  20),  where  the  modified  NDSI  method  detects  fewer  skin 
pixels  than  the  illumination-invariant  methods  at  all  false-alarm  rates. 

What  is  notable  about  these  results  is  that  the  NDSI  method  and  the  illumination- 
invariant  methods  produce  visually  similar  detection  results.  The  NDSI  method  re¬ 
quires  the  white  and  gray  Spectralon  calibration  panels  in  order  to  perform  ELM 
correction  on  the  pixels,  while  the  illumination-invariant  algorithms  do  not. 

4.6.2  Effect  of  Illumination  Error 

The  four  images  discussed  in  the  previous  section  are  chosen  because  each  are 
subject  to  different  illumination  conditions.  The  illumination-invariant  skin  detection 
methods  have  lower  false  alarm  rates  in  the  cloudy  scene  (Fig.  20)  than  in  the  other 
scenes.  It  is  also  shown  in  Fig.  14  that  the  error  between  the  estimated  and  measured 
illumination  transform  is  the  lowest  in  the  cloudy  scene.  This  section  investigates  the 
effect  of  illumination  error  on  the  performance  of  the  detection  methods. 

The  four  graphs  in  Fig.  23  plot  false-alarm  rate  against  illumination  error  for  all  48 
images.  The  false-alarm  rate  is  drawn  from  the  ROC  curve  at  the  Pp>  =  0.8  operating 
point.  The  illumination  error  is  calculated  from  Eq.  (37). 

Neither  the  NDSI  method  nor  the  MF  method  appear  to  show  a  correlation  be¬ 
tween  illumination  error  and  false-alarm  rate.  This  is  not  surprising  for  the  NDSI 
method,  which  does  not  use  the  estimated  illumination  transform.  However,  it  sug¬ 
gests  that  the  MF  method  performs  consistently,  although  poorly,  even  when  the  scene 
illumination  varies.  The  LDA  and  ACE  methods,  on  the  other  hand,  show  a  general 
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Figure  23.  False-alarm  rate  at  Pd  =  0.8  plotted  against  illumination  error  for  each 
method.  The  two  factors  appear  independent  from  one  another  in  the  NDSI  and  MF 
methods,  while  they  are  somewhat  correlated  in  the  LDA  and  ACE  methods. 
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trend  of  an  increasing  false  alarm  rate  as  the  illumination  error  grows.  Also,  their 
highest  false-alarm  rates  are  lower  than  the  highest  of  the  NDSI  and  MF  methods. 
Still,  the  LDA  and  ACE  methods  both  detect  human  skin  without  using  calibration 
panels  at  performances  comparable  to  the  panel- dependent  NDSI  method. 

4.6.3  Constant  Threshold  Comparison 

In  Figs.  24,  25,  26,  and  27,  the  skin  detection  methods  are  applied  to  each  sequence 
of  images  to  illustrate  what  they  would  display  in  a  real-time  dismount  detection 
scenario.  Unlike  the  previous  sections  where  the  thresholds  were  set  to  achieve  a 
desired  detection  rate,  the  thresholds  in  this  section  are  held  constant  throughout 
all  four  sequences.  This  is  more  realistic  for  a  real-time  dismount  detection  scenario. 
The  modified  NDSI  scores  are  thresholded  at  0.25,  the  MF  scores  at  0.975,  the  LDA 
scores  at  37,  and  the  ACE  scores  at  0.983.  Pixels  that  score  above  these  thresholds 
are  classified  as  skin.  Detection  results  for  the  modified  NDSI  method  are  included 
for  reference,  even  though  that  method  cannot  operate  in  real-time  because  it  relies 
on  the  hand-selected  calibration  panels. 

The  LDA  method  mistakes  the  pink  t-shirt  and  the  red  brick  in  the  center  of  the 
scene  as  a  dismount  in  the  first  three  sequences,  but  not  in  the  fourth  sequence.  It  is 
capable  of  a  lower  false-alarm  rate  in  the  sunny  scenes,  demonstrated  in  Fig.  17,  but 
that  requires  adjusting  the  detection  threshold.  Ideally,  a  real-time  dismount  detec¬ 
tion  algorithm  should  maintain  consistent  performance  between  different  illumination 
conditions  without  changing  the  detection  threshold. 

The  pink  t-shirt  also  causes  false  alarms  for  the  ACE  method  in  sunny  scenes  but 
not  as  many  as  the  LDA  method.  While  the  false  alarm  rate  of  the  ACE  method 
decreases  in  cloudy  skies,  the  opposite  is  true  for  the  MF  method.  When  the  sky 
is  cloudy  in  the  fourth  sequence  (Fig.  20),  the  MF  method  produces  false  alarms  on 
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Sequence  1 


Figure  24.  Detection  results  for  the  first  sequence  with  constant  detection  thresholds 
of  0.25  for  NDSI,  0.975  for  MF,  37  for  LDA,  and  0.983  for  ACE.  As  a  passing  cloud 
blocks  the  sunlight  in  frames  3  and  4,  the  number  of  false  alarms  increases  in  the  MF 
method  and  decreases  in  the  ACE  method. 
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Figure  25.  Detection  results  for  the  second  sequence  with  constant  detection  thresholds 
of  0.25  for  NDSI,  0.975  for  MF,  37  for  LDA,  and  0.983  for  ACE.  A  pink  t-shirt  and  a 
red  brick  cause  false  alarms  in  the  LDA  and  ACE  methods. 
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Sequence  3 


Figure  26.  Detection  results  for  the  third  sequence  with  constant  detection  thresholds 
of  0.25  for  NDSI,  0.975  for  MF,  37  for  LDA,  and  0.983  for  ACE.  The  MF  method  does 
not  require  in-scene  calibration  panels,  yet  its  skin  detection  results  look  similar  to  the 
NDSI  method,  which  does  require  the  panels. 
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Figure  27.  Detection  results  for  the  fourth  sequence  with  constant  detection  thresholds 
of  0.25  for  NDSI,  0.975  for  MF,  37  for  LDA,  and  0.983  for  ACE.  In  these  cloudy  scenes, 
the  illumination-invariant  methods  outperform  the  reflectance-based  NDSI  method. 


the  pink  t-shirt  and  on  a  piece  of  cardboard  above  the  t-shirt.  This  also  occurs  as 
a  cloud  momentarily  blocks  the  sun  in  the  first  sequence  (Fig.  17).  A  passing  cloud 
changes  the  scene  illumination  in  frame  3  and  4,  causing  false  alarms  in  the  MF 
method.  Conversely,  the  false  alarm  rate  decreases  for  the  ACE  method  in  the  same 
two  frames. 

The  performance  of  the  four  skin  detection  methods  are  quantitatively  compared 
in  ROC  curves  in  Figs.  28,  29,  30,  and  31.  These  ROC  curves  reveal  a  few  trends. 
One  is  that  the  false-alarm  rate  for  the  NDS1  method  increases  in  the  cloudy  fourth 
sequence.  Also,  the  LDA  and  ACE  methods  have  a  higher  detection  rate  than  the 
NDSI  method  at  false-alarm  rates  above  PFa  =  0.01,  but  the  MF  method  does  not. 
The  MF  method  only  occasionally  outperforms  the  NDSI  method. 

The  Area  Under  Curve  (AUC)  for  the  ROC  curves  are  plotted  in  Fig.  32.  As  the 
name  suggests,  the  AUC  is  the  area  under  the  ROC  curve.  It  represents  the  average 
Pd  across  all  PFa  between  zero  and  one.  A  high  AUC  is  a  quality  of  a  good  detector. 
Across  all  four  sequences,  the  LDA  and  ACE  methods  have  a  high  AUC.  It  remains 
high  when  the  illumination  changes.  The  NDSI  method  also  has  a  high  ALIC,  but 
not  in  the  cloudy  fourth  sequence.  The  MF  method  consistently  has  the  lowest  AUC. 

Another  trend  in  the  ROC  curves  is  that  the  LDA  method  has  a  lower  detection 
rate  than  the  ACE  method  at  very  low  false-alarm  rates.  The  ACE  method  is  better 
than  the  LDA  method  at  false-alarm  suppression.  Furthermore,  in  sunny  scenes  when 
the  NDSI  method  performs  well,  it  has  a  higher  detection  rate  when  PFa  <  0.01 
than  any  of  the  illumination-invariance  methods.  The  NDSI  method  was  designed  to 
suppress  false  alarms,  and  it  does  this  better  then  the  other  methods  when  the  scene  is 
brightly  illuminated.  The  illumination-invariant  methods  were  designed  to  detect  skin 
pixels  under  a  wide  variety  of  illumination  conditions.  The  ACE  and  LDA  methods 
accomplish  this  by  having  a  high  detection  rate  even  as  the  illumination  changes. 
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Figure  28.  ROC  curves  comparing  the  NDSI,  MF,  LDA,  and  ACE  detection  methods 
on  the  images  in  sequence  1. 
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Figure  29.  ROC  curves  comparing  the  NDSI,  MF,  LDA,  and  ACE  detection  methods 
on  the  images  in  sequence  2. 
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Figure  30.  ROC  curves  comparing  the  NDSI,  MF,  LDA,  and  ACE  detection  methods 
on  the  images  in  sequence  3. 
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Figure  31.  ROC  curves  comparing  the  NDSI,  MF,  LDA,  and  ACE  detection  methods 
on  the  images  in  sequence  4. 
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Figure  32.  AUC  comparison  of  the  ROC  curves  in  Fig.  28,  29,  30,  and  31.  The  LDA 
and  ACE  methods  have  consistently  high  AUCs  across  different  illumination  conditions, 
while  that  of  the  NDSI  method  drops  in  the  cloudy  fourth  sequence. 


4-34 


4.7  Summary 


This  chapter  demonstrates  the  results  of  the  using  the  target  signature  gener¬ 
ation  process  outlined  in  Chapter  3.  The  MODTRAN  simulations  of  normalized 
illumination  transforms  follow  a  general  pattern  of  high  magnitude  in  VIS  and  low 
magnitude  in  SWIR  with  atmospheric  attenuation  bands  characteristic  of  natural 
illumination.  The  estimated  illumination  transform  is  compared  to  the  measured 
illumination  transform  from  four  sequences  of  hyperspectral  images,  and  it  has  the 
lowest  error  in  the  cloudy  scenes.  This  estimated  illumination  transform  is  used  to 
generate  a  target  signature  from  the  measured  spectral  reflectance  of  the  dismount ’s 
exposed  skin.  The  target  signature  has  low  magnitude  at  SWIR  wavelengths  where 
both  the  illumination  transform  and  the  spectral  reflectance  of  human  skin  are  low. 

The  reflectance-based  skin  detection  method  from  [37]  is  modified  to  produce  a 
scalar  score  for  each  pixel,  so  that  it  can  be  thresholded  to  form  a  ROC  curve.  ROC 
curves  are  used  to  evaluate  it  against  the  illumination-invariant  methods. 

The  detection  results  of  all  four  detection  methods  are  qualitatively  evaluated 
by  comparing  their  ability  to  suppress  false  alarms  at  a  constant  detection  rate. 
The  modified  NDSI  method  produces  low  false-alarm  rates  (Pfa  <  0.001)  in  sunny 
scenes,  but  has  a  higher  false-alarm  rate  than  the  others  ( Pfa  >  0.05)  in  the  cloudy 
scene.  The  illumination-invariant  methods  did  not  need  the  calibration  panels  in  the 
scene  like  the  NDSI  method,  and  yet  they  achieved  comparable  skin  detection.  The 
ACE  and  LDA  methods  show  a  pattern  of  increasing  the  false-alarm  rates  as  the 
illumination  error  increases,  but  their  average  false-alarm  rates  are  still  lower  than 
the  MF  method. 

The  four  methods  are  quantitatively  compared  by  observing  their  detection  per¬ 
formance  across  each  frame  while  the  detection  thresholds  are  held  constant.  This 
demonstrates  what  would  be  seen  in  a  real-time  dismount  detection  scenario.  The 
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LDA  and  ACE  methods  are  able  to  maintain  consistent  average  detection  rates,  as 
characterized  by  their  AUC  value,  even  as  the  illumination  conditions  of  the  scene 
change  from  sunny  to  cloudy. 
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5.  Conclusions  and  Future  Work 


This  chapter  concludes  the  thesis  by  summarizing  the  important  points  and  propos¬ 
ing  ideas  for  further  research  on  this  topic.  The  process  of  characterizing  a  hyper- 
spectral  pixel  and  predicting  a  skin  pixel  from  a  simulated  illumination  transform 
is  described  in  Section  5.1.  Then  a  few  possible  ideas  for  future  research  in  real¬ 
time  hyperspectral  dismount  detection  are  discussed  in  Section  5.2.  Finally,  the  main 
contributions  of  this  thesis  to  the  ATR  community  are  explained  in  Section  5.3. 

5.1  Summary  of  Methods  and  Conclusions 

The  goal  of  this  thesis  is  to  shorten  the  dismount  detection  algorithm  chain  by 
identifying  a  method  that  does  not  require  in-scene  calibration  panels  for  radiometric 
calibration.  The  new  method  should  be  robust  in  different  illumination  conditions, 
and  achieve  similar  detection  performance  to  the  reflectance-based  method  that  re¬ 
quires  the  calibration  panels. 

This  goal  is  met  by  understanding  the  appearance  of  skin  in  hyperspectral  im¬ 
agery  and  predicting  its  appearance  across  a  wide  range  of  different  illumination 
conditions.  A  hyperspectral  image  pixel  is  expressed  as  a  function  of  an  object’s 
spectral  reflectance,  the  scene  illumination,  and  the  spectral  response  of  the  hyper¬ 
spectral  camera.  In  vector  notation,  this  expression  reduces  to  a  transform  between 
a  reflectance  vector,  which  is  an  object’s  spectral  reflectance  sampled  at  the  center 
wavelengths  of  the  sensor’s  channel  filters,  and  a  hyperspectral  pixel.  The  transform 
is  referred  to  as  an  illumination  transform  because  it  accounts  for  the  effects  of  the 
illumination  conditions  on  the  appearance  of  an  object  in  the  scene.  It  is  simulated 
in  MODTRAN  over  1728  different  scenarios  by  independently  varying  a  set  of  input 
parameters.  The  resulting  set  of  illumination  transforms  reveals  that  the  spectral 
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distribution  of  natural  daylight  follows  a  predictable  pattern  and  can  be  adequately 
estimated  by  taking  the  mean  of  that  set. 

The  mean  illumination  transform  is  used  to  convert  a  measured  skin  reflectance 
vector  into  an  estimated  skin  signature.  This  skin  pixel  estimate  is  used  by  three 
different  detection  methods  on  a  set  of  test  images.  They  are  considered  illumination- 
invariant  detection  methods  because  they  use  a  skin  pixel  estimate  that  does  not 
change  between  scenes.  They  are  evaluated  against  the  NDSI  skin  detection  method, 
which  is  a  reflectance-based  method  that  requires  a  pair  of  calibration  panels  in  each 
scene  in  order  to  perform  ELM  correction. 

The  Matched  Filter  achieves  good  false- alarm  suppression  in  some  scenes,  with  a 
Pfa  as  low  as  0.0007  in  Fig.  17,  but  it  has  a  lower  average  skin  pixel  detection  rate 
(AUC  <  0.95)  than  the  other  methods.  In  sunny  scenes,  the  Adaptive  Coherence 
Estimator  and  the  method  based  on  Linear  Discriminant  Analysis  have  a  higher  skin 
pixel  detection  rate  (Pp  >  0.95)  than  the  NDSI  method  when  Pfa  >  0.02,  but  have 
a  much  lower  detection  rate  than  the  NDSI  method  when  Pfa  <  0.01.  In  cloudy 
scenes,  however,  the  illumination-invariant  skin  detection  methods  have  noticeably 
lower  false-alarm  rates  (Pfa  <  0.01)  than  the  NDSI  method  when  the  detection  rate 
is  set  to  Pd  =  0.8. 

5.2  Future  Work 

This  section  proposes  further  areas  of  research  that  are  pertinent  to  hyperspectral 
dismount  detection. 

5.2.1  Combine  Skin  and  Illumination  Covariance 

A  hyperspectral  target  detection  method  called  the  Kelly  detector  accounts  for 
variations  in  the  target’s  spectral  reflectance  [32],  In  the  Kelly  detector,  the  spectral 
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reflectance  of  the  target  is  expressed  as  a  random  vector  with  a  known  mean  and 
covariance.  After  the  image  is  converted  into  estimated  reflectance  vectors,  the  mean 
target  vector  is  subtracted  from  all  pixels,  which  are  then  divided  by  the  target 
covariance.  Target  pixel  vectors  are  detected  by  selecting  the  pixels  with  low  absolute 
magnitudes  at  the  end  of  this  process. 

This  detector  can  be  applied  to  dismount  detection  by  using  human  skin  as  the 
target  signature.  Different  skin  types  have  different  spectral  reflectances,  as  pointed 
out  in  [37].  If  one  were  to  express  the  covariance  of  human  skin’s  spectral  reflectance, 
then  a  skin  detection  algorithm  could  be  created  that  is  invariant  to  skin  type.  It 
would  detect  fair  skin  and  dark  skin  with  the  same  likelihood,  making  it  useful  for 
scenarios  when  a  potential  dismount  may  be  of  unknown  ethnicity. 

Furthermore,  the  covariance  of  the  normalized  illumination  transform  used  to 
generate  a  target  radiance  signature  could  also  be  derived  and  included  in  the  detector. 
The  MODTRAN-simulated  illumination  transforms  in  Fig.  9  could  serve  as  a  starting 
point  for  deriving  the  covariance  of  a  natural,  normalized  illumination  transform.  The 
detector  could  account  for  variability  in  the  target’s  spectral  reflectance  and  the  scene 
illumination  to  determine  the  expected  variability  of  the  target  signature.  This  could 
result  in  a  dismount  detector  that  is  more  robust  in  different  illumination  conditions 
and  more  diverse  in  the  types  of  skin  that  it  detects. 

5.2.2  Common  False-Alarm  Sources 

Whether  employing  the  NDSI  method  or  the  illumination-invariant  methods  from 
Chapter  4  for  dismount  detection,  there  are  some  materials  that  are  more  likely  than 
others  to  induce  false  alarms.  Certain  types  of  vegetation,  for  example,  cause  high 
false- alarm  rates  with  the  NDSI  detector.  The  pink  t-shirt  and  the  red  brick  in  Fig.  18 
cause  false  alarms  for  the  LDA  and  ACE  detectors.  These  materials  are  misclassified 


5-3 


because  they  have  a  similar  spectral  signature  to  human  skin. 

Future  work  could  identify  common  sources  of  false  alarms  in  dismount  detection 
algorithms  by  looking  for  spectrally  similar  materials.  It  is  important  to  know  if  some 
detection  algorithms  are  more  sensitive  than  others  to  certain  false  alarm  sources. 
This  would  help  predict  which  algorithms  would  be  most  suitable  for  different  settings. 
A  dismount  detector  that  is  sensitive  to  bricks  and  other  synthetic  materials  should 
not  be  used  in  an  urban  environment.  A  dismount  detector  that  commonly  mistakes 
tree  bark  for  human  skin  should  not  be  used  in  a  forest.  Spectral  databases  like  the 
USGS  spectral  library  in  [8]  could  offer  a  starting  point  to  find  common  false-alarm 
sources  for  dismount  detectors. 

5.2.3  Band  Selection  for  Skin  Detection 

The  NDSI  skin  detection  method  with  NDGRI  false-alarm  suppression  only  re¬ 
quires  four  spectral  bands  [37].  Implementing  the  method  on  a  full  hyperspectral 
camera  with  hundreds  of  spectral  bands  is  a  waste  of  sensor  capability.  A  specialized 
multispectral  camera  responsive  to  only  the  required  spectral  bands  can  perform  dis¬ 
mount  detection  more  efficiently  [40].  In  order  to  justify  special  multispectral  cameras 
for  dismount  detection,  those  four  bands  should  be  tested  against  other  possible  sets 
of  spectral  bands  for  optimality. 

Band  selection  methods  search  for  the  optimal  set  of  spectral  bands  for  a  specific 
target  signature  and  detection  algorithm  [7,  26,  52],  They  incorporate  information 
about  the  target-to-background  contrast  and  the  availability  of  natural  illumination 
at  each  spectral  band.  It  is  possible  that  there  exists  a  three-band  combination  that 
could  provide  dismount  detection  results  comparable  to  the  NDSI  method.  There 
could  also  be  a  five-band  combination  that  detects  dismounts  so  well  that  it  is  worth 
the  cost  of  the  additional  spectral  channel. 
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The  number  of  spectral  bands  is  one  factor  that  drives  the  cost  of  multispectral 
cameras.  A  band  selection  study  could  find  the  optimal  dismount  detection  perfor¬ 
mance  for  a  given  number  of  spectral  bands. 

5.2.4  Band  Width  vs  Feature  Retention 

Similar  to  the  band  selection  study  suggested  in  Section  5.2.3,  the  appropriate 
band  widths  should  also  be  considered  for  optimal  target  detection.  When  designing 
a  multispectral  detection  system,  the  width  of  each  band  should  be  considerate  of 
two  opposing  factors.  On  one  hand,  the  bands  should  be  narrow  enough  to  retain 
the  spectral  features  that  distinguish  the  target  from  the  background.  On  the  other 
hand,  the  bands  should  be  wide  enough  to  allow  plenty  of  light  energy  to  pass  through 
and  increase  the  SNR.  There  may  be  valuable  features  at  wavelengths  that  receive 
very  weak  natural  illumination.  It  would  be  acceptable  to  widen  the  bands  at  such 
wavelengths. 

The  four  bands  used  in  the  NDSI  skin  detection  method  may  yield  better  perfor¬ 
mance  if  they  were  wider.  A  trade  off  study  could  find  the  optimal  band  widths  at 
these  wavelengths  for  maximizing  the  average  skin  detection  performance. 

5.2.5  Incorporate  Image  Processing  Techniques 

The  spectral  detection  methods  discussed  in  this  study  evaluate  each  image  pixel 
independently,  without  considering  their  relation  to  other  pixels  in  the  image.  Image 
processing  techniques  like  morphological  filters  and  median  filters  can  correct  many 
of  the  false  alarms  generated  by  spectral  detection  methods.  A  single  stray  pixel  that 
is  marked  as  skin  can  be  suppressed  if  its  neighboring  pixels  are  not  skin.  This  might 
limit  the  effective  range  of  the  dismount  detector,  but  it  would  ensure  a  lower  false 
alarm  rate. 
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Additionally,  a  real-time  dismount  detection  camera  can  compare  the  skin  pixels 
between  consecutive  frames  to  filter  out  false  alarms  caused  by  noise.  A  temporal  low- 
pass  filter  can  suppress  stray  skin  pixels  that  appear  and  disappear  between  frames. 
Combining  spectral,  spatial,  and  temporal  techniques  could  yield  better  dismount 
detection  performance  than  each  of  them  individually. 

5.3  Contributions 

This  thesis  has  shown  that,  with  only  a  slight  decrease  in  performance,  dismount 
detection  can  be  applied  in  real-time  to  hyperspectral  data  without  using  calibration 
panels,  even  as  the  illumination  in  the  scene  changes.  By  incorporating  some  as¬ 
sumptions  about  the  expected  operating  conditions  of  a  real-time  dismount  detection 
system,  the  effects  of  changing  scene  illumination  can  be  mitigated.  This  development 
advances  hyperspectral  dismount  detection  from  only  being  applicable  in  stationary 
scenes  with  calibration  panels,  to  being  implemented  on  a  snapshot  HSI  camera  as 
it  pans  across  a  scene  in  unknown  illumination  conditions.  Detecting  human  skin 
without  performing  radiometric  calibration  moves  real-time  hyperspectral  dismount 
detection  towards  becoming  a  deployable  capability. 
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